This is documentation for Orange 2.7. For the latest documentation, see Orange 3.

Multi-label classification (multilabel)

Multi-label classification is a machine learning prediction problem in which multiple binary variables (i.e. labels) are being predicted. Orange offers a limited number of methods for this task.

Multi-label data is represented as multi-target data with discrete binary classes with values ‘0’ and ‘1’. Multi-target data is also supported by Orange’s tab file format using multiclass directive.

Binary Relevance Learner

The most basic problem transformation method for multi-label classification is the Binary Relevance method. It learns |L| binary classifiers H_l:X \rightarrow \{l, \neg l\}, one for each different label l in L. It transforms the original data set into |L| data sets D_l that contain all examples of the original data set, labelled as l if the labels of the original example contained l and as \neg l otherwise. It is the same solution used in order to deal with a single-label multi-class problem using a binary classifier. For more information, see G. Tsoumakas and I. Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3):1-13, 2007.

Note that a copy of the table is made in RAM for each label to enable construction of a classifier. Due to technical limitations, that is currently unavoidable and should be remedied in Orange 3.

class Orange.multilabel.BinaryRelevanceLearner(**argkw)

Bases: Orange.multilabel.multibase.MultiLabelLearner

Class that implements the Binary Relevance (BR) method.

Parameters:
class Orange.multilabel.BinaryRelevanceClassifier(**kwds)

Bases: Orange.multilabel.multibase.MultiLabelClassifier

__call__(instance, result_type=0)
Return type:a list of Orange.data.Value, a list of Orange.statistics.distribution.Distribution, or a tuple with both

Examples

The following example demonstrates a straightforward invocation of this algorithm (mlc-classify.py):

emotions = Orange.data.Table('emotions')
learner = Orange.multilabel.BinaryRelevanceLearner()
classifier = learner(emotions)
print classifier(emotions[0])

LabelPowerset Learner

LabelPowerset Classification is another transformation method for multi-label classification. It considers each different set of labels that exists in the multi-label data as a single class. Thus it learns a classification problem H:X \rightarrow \mathbb{P}(L), where \mathbb{P}(L) is the power set of L. For more information, see G. Tsoumakas and I. Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3):1-13, 2007.

class Orange.multilabel.LabelPowersetLearner(**argkw)

Bases: Orange.multilabel.multibase.MultiLabelLearner

Class that implements the LabelPowerset (LP) method.

Parameters:
class Orange.multilabel.LabelPowersetClassifier(**argkw)

Bases: Orange.multilabel.multibase.MultiLabelClassifier

__call__(instance, result_type=0)
Return type:a list of Orange.data.Value, a list of Orange.statistics.distribution.Distribution, or a tuple with both

Examples

The following example demonstrates a straightforward invocation of this algorithm (mlc-classify.py):

emotions = Orange.data.Table('emotions')
learner = Orange.multilabel.LabelPowersetLearner()
classifier = learner(emotions)
print classifier(emotions[0])

MultikNN Learner

MultikNN Classification is the base class of kNN method based multi-label classification.

class Orange.multilabel.MultikNNLearner(**argkw)

Bases: Orange.multilabel.multibase.MultiLabelLearner

Class implementing the MultikNN (Multi-Label k Nearest Neighbours) algorithm.

k

Number of neighbors. The default value is 1

num_labels

Number of labels

label_indices

The indices of labels in the domain

knn

Orange.classification.knn.FindNearest for nearest neighbor search

Parameters:instances (Orange.data.Table) – a table of instances.
class Orange.multilabel.MultikNNClassifier(**argkw)

Bases: Orange.multilabel.multibase.MultiLabelClassifier

ML-kNN Learner

ML-kNN Classification is an adaptation kNN for multi-label classification. In essence, ML-kNN uses the kNN algorithm independently for each label l. It finds the k nearest examples to the test instance and considers those that are labeled at least with l as positive and the rest as negative. What mainly differentiates this method from other binary relevance (BR) methods is the use of prior probabilities. ML-kNN can also rank labels.

For more information, see Zhang, M. and Zhou, Z. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recogn. 40, 7 (Jul. 2007), 2038-2048.

class Orange.multilabel.MLkNNLearner(**argkw)

Bases: Orange.multilabel.multiknn.MultikNNLearner

Class implementing the ML-kNN (Multi-Label k Nearest Neighbours) algorithm. The class is based on the pseudo-code made available by the authors.

The pseudo code of ML-kNN:

[\vec y_t,\vec r_t] = ML-kNN(T,K,t,s)

\%Computing \quad the \quad prior \quad probabilities \quad P(H_b^l)

(1) for \quad l \in y \quad do

(2) \quad P(H_1^l) = (s+ \sum_{i=1}^m \vec y_{x_i}(l))/(s * 2+m); P(H_0^l)=1-P(H_1^l)

\%Computing \quad the \quad posterior \quad probabilities P(E_j^l|H_b^l)

(3) Identify \quad N(x_i), i \in {1,2,...,m}

(4) for \quad l \in y \quad do

(5) \quad for \quad j \in{0,1,...,K} \quad do

(6) \quad \quad c[j]=0; c'[j]=0

(7) \quad for \quad i \in{1,...,m} \quad do

(8) \quad \quad \delta = \vec C_{x_i}(l)=\sum_{a \in N(x_i)} \vec y_a(l)

(9) \quad \quad if (\vec y_{x_i}(l)==1) \quad then \quad c[\delta]=c[\delta]+1

(10)\quad \quad \quad \quad else \quad c'[\delta]=c'[\delta]+1

(11)\quad for \quad j \in{0,1,...,K} \quad do

(12)\quad \quad P(E_j^l|H_1^l)=(s+c[j])/(s * (K+1)+ \sum_{p=0}^k c[p])

(13)\quad \quad P(E_j^l|H_0^l)=(s+c'[j])/(s *(K+1)+ \sum_{p=0}^k c'[p])

\%Computing \quad \vec y_t \quad and \quad \vec r_t

(14) Identify \quad N(t)

(15) for \quad l \in y \quad do

(16)\quad \vec C_t(l) = \sum_{a \in N(t)} \vec y_a(L)

(17)\quad \vec y_t(l) = argmax_{b \in {0,1}}P(H_b^l)P(E_{\vec C_t(l)}^l|H_b^l)

(18)\quad \vec r_t(l)=P(H_1^l|E_{\vec C_t(l)}^l)=P(H_1^l)P(E_{\vec C_t(l)}|H_1^l)/P(E_{\vec C_t(l)}^l)=P(H_1^l)P(E_{\vec C_t(l)}|H_1^l)/(\sum_{b \in {0,1}}P(H_b^l)P(E_{\vec C_t(l)}^l|H_b^l))

k

Number of neighbors. The default value is 1

smooth

Smoothing parameter controlling the strength of uniform prior (Default value is set to 1 which yields the Laplace smoothing).

knn

Orange.classification.knn.FindNearest for nearest neighbor search

Parameters:instances (Orange.data.Table) – a table of instances.
compute_cond(instances)

Compute posterior probabilities for each label of the training set.

compute_prior(instances)

Compute prior probability for each label of the training set.

class Orange.multilabel.MLkNNClassifier(**argkw)

Bases: Orange.multilabel.multiknn.MultikNNClassifier

__call__(instance, result_type=0)
Return type:a list of Orange.data.Value, a list of Orange.statistics.distribution.Distribution, or a tuple with both

Examples

The following example demonstrates a straightforward invocation of this algorithm (mlc-classify.py):

emotions = Orange.data.Table('emotions')
learner = Orange.multilabel.MLkNNLearner(k=5)
classifier = learner(emotions)
print classifier(emotions[0])

BR-kNN Learner

BR-kNN Classification is an adaptation of the kNN algorithm for multi-label classification that is conceptually equivalent to using the popular Binary Relevance problem transformation method in conjunction with the kNN algorithm. It also implements two extensions of BR-kNN. For more information, see E. Spyromitros, G. Tsoumakas, I. Vlahavas, An Empirical Study of Lazy Multilabel Classification Algorithms, Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008), Springer, Syros, Greece, 2008.

class Orange.multilabel.BRkNNLearner(**argkw)

Bases: Orange.multilabel.multiknn.MultikNNLearner

Class implementing the BR-kNN learner.

k

Number of neighbours. If set to 0 (which is also the default value), the square root of the number of instances is used.

ext

Extension type. The default is None, means ‘Standard BR’; ‘a’ means predicting top ranked label in case of empty prediction set; ‘b’ means predicting top n ranked labels based on size of labelset in neighbours.

knn

Orange.classification.knn.FindNearest for nearest neighbor search

Parameters:instances (Orange.data.Table) – a table of instances.
class Orange.multilabel.BRkNNClassifier(**argkw)

Bases: Orange.multilabel.multiknn.MultikNNClassifier

__call__(instance, result_type=0)
Return type:a list of Orange.data.Value, a list of Orange.statistics.distribution.Distribution, or a tuple with both
get_labels_a(prob, _neighs=None)

used for BRknn-a

Parameters:prob (list of double) – the probabilities of the labels
Return type:the list label value
get_labels_b(prob, neighs)

used for BRknn-b

Parameters:prob (list of double) – the probabilities of the labels
Return type:the list label value
get_prob(neighbours)

Calculates the probabilities of the labels, based on the neighboring instances.

Parameters:neighbours (list of Orange.data.Instance) – a list of nearest neighboring instances.
Return type:the prob of the labels

Examples

The following example demonstrates a straightforward invocation of this algorithm (mlc-classify.py):

emotions = Orange.data.Table('emotions')
learner = Orange.multilabel.BRkNNLearner(k=5)
classifier = learner(emotions)
print classifier(emotions[0])