Scoring methods (scoring)

CA

Orange.evaluation.CA(cls, results=None, **kwargs)[source]

A wrapper for sklearn.metrics.classification.accuracy_score. The following is its documentation:

Accuracy classification score.

In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

Read more in the User Guide.

Precision

Orange.evaluation.Precision(cls, results=None, **kwargs)[source]

A wrapper for sklearn.metrics.classification.precision_score. The following is its documentation:

Compute the precision

The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

The best value is 1 and the worst value is 0.

Read more in the User Guide.

Recall

Orange.evaluation.Recall(cls, results=None, **kwargs)[source]

A wrapper for sklearn.metrics.classification.recall_score. The following is its documentation:

Compute the recall

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The best value is 1 and the worst value is 0.

Read more in the User Guide.

F1

Orange.evaluation.F1(cls, results=None, **kwargs)[source]

${sklpar} Parameters ———- results : Orange.evaluation.Results

Stored predictions and actual data in model testing.
target : int, optional (default=None)
Value of class to report.

Examples

>>> Orange.evaluation.F1(results)
array([ 0.9...])

PrecisionRecallFSupport

Orange.evaluation.PrecisionRecallFSupport(cls, results=None, **kwargs)[source]

A wrapper for sklearn.metrics.classification.precision_recall_fscore_support. The following is its documentation:

Compute precision, recall, F-measure and support for each class

The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The F-beta score can be interpreted as a weighted harmonic mean of the precision and recall, where an F-beta score reaches its best value at 1 and worst score at 0.

The F-beta score weights recall more than precision by a factor of beta. beta == 1.0 means recall and precision are equally important.

The support is the number of occurrences of each class in y_true.

If pos_label is None and in binary classification, this function returns the average precision, recall and F-measure if average is one of 'micro', 'macro', 'weighted' or 'samples'.

Read more in the User Guide.

AUC

Orange.evaluation.AUC(cls, results=None, **kwargs)[source]

${sklpar} Parameters ———- results : Orange.evaluation.Results

Stored predictions and actual data in model testing.
target : int, optional (default=None)
Value of class to report.

Log Loss

Orange.evaluation.LogLoss(cls, results=None, **kwargs)[source]

${sklpar} Parameters ———- results : Orange.evaluation.Results

Stored predictions and actual data in model testing.
eps : float
Log loss is undefined for p=0 or p=1, so probabilities are clipped to max(eps, min(1 - eps, p)).
normalize : bool, optional (default=True)
If true, return the mean loss per sample. Otherwise, return the sum of the per-sample losses.
sample_weight : array-like of shape = [n_samples], optional
Sample weights.

Examples

>>> Orange.evaluation.LogLoss(results)
array([ 0.3...])

MSE

Orange.evaluation.MSE(cls, results=None, **kwargs)[source]

A wrapper for sklearn.metrics.regression.mean_squared_error. The following is its documentation:

Mean squared error regression loss

Read more in the User Guide.

MAE

Orange.evaluation.MAE(cls, results=None, **kwargs)[source]

A wrapper for sklearn.metrics.regression.mean_absolute_error. The following is its documentation:

Mean absolute error regression loss

Read more in the User Guide.

R2

Orange.evaluation.R2(cls, results=None, **kwargs)[source]

A wrapper for sklearn.metrics.regression.r2_score. The following is its documentation:

R^2 (coefficient of determination) regression score function.

Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Read more in the User Guide.

CD diagram

Orange.evaluation.compute_CD(avranks, N, alpha='0.05', test='nemenyi')[source]

Returns critical difference for Nemenyi or Bonferroni-Dunn test according to given alpha (either alpha=”0.05” or alpha=”0.1”) for average ranks and number of tested data sets N. Test can be either “nemenyi” for for Nemenyi two tailed test or “bonferroni-dunn” for Bonferroni-Dunn test.

Orange.evaluation.graph_ranks(avranks, names, cd=None, cdmethod=None, lowv=None, highv=None, width=6, textspace=1, reverse=False, filename=None, **kwargs)[source]

Draws a CD graph, which is used to display the differences in methods’ performance. See Janez Demsar, Statistical Comparisons of Classifiers over Multiple Data Sets, 7(Jan):1–30, 2006.

Needs matplotlib to work.

The image is ploted on plt which should be imported using import matplotlib.pyplot as plt.

Parameters:
  • avranks – List of average methods’ ranks.
  • names – List of methods’ names.
  • cd – Critical difference. Used for marking methods whose difference is not statistically significant.
  • cdmethod – None by default. It can be an index of element in avranks or names which specifies the method which should be marked with an interval.
  • lowv – The lowest shown rank, if None, use 1.
  • highv – The highest shown rank, if None, use len(avranks).
  • width – Width of the drawn figure in inches, default 6 in.
  • textspace – Space on figure sides left for the description of methods, default 1 in.
  • reverse – If True, the lowest rank is on the right. Default: False.
  • filename – Output file name (with extension). If not None, the image is also saved to a file. Formats supported by matplotlib can be used.

Example

>>> import Orange
>>> import matplotlib.pyplot as plt
>>> names = ["first", "third", "second", "fourth" ]
>>> avranks =  [1.9, 3.2, 2.8, 3.3 ]
>>> cd = Orange.evaluation.compute_CD(avranks, 30) #tested on 30 datasets
>>> Orange.evaluation.graph_ranks(avranks, names, cd=cd, width=6, textspace=1.5)
>>> plt.show()

The code produces the following graph:

../_images/statExamples-graph_ranks1.png