Distance (distance)

The following example demonstrates how to compute distances between all examples:

>>> from Orange.data import Table
>>> from Orange.distance import Euclidean
>>> iris = Table('iris')
>>> dist_matrix = Euclidean(iris)
>>> # Distance between first two examples
>>> dist_matrix.X[0, 1]
0.53851648

The module for Distance is based on the popular scikit-learn and scipy packages. We wrap the following distance metrics:

  • Orange.distance.Euclidean
  • Orange.distance.Manhattan
  • Orange.distance.Cosine
  • Orange.distance.Jaccard
  • Orange.distance.SpearmanR
  • Orange.distance.SpearmanRAbsolute
  • Orange.distance.PearsonR
  • Orange.distance.PearsonRAbsolute

All distances have a common interface to the __call__ method which is the following:

Distance.__call__(e1, e2=None, axis=1, impute=False)[source]
Parameters:
  • e1 (Orange.data.Table or Orange.data.RowInstance or numpy.ndarray) – input data instances, we calculate distances between all pairs
  • e2 (Orange.data.Table or Orange.data.RowInstance or numpy.ndarray) – optional second argument for data instances if provided, distances between each pair, where first item is from e1 and second is from e2, are calculated
  • axis (int) – if axis=1 we calculate distances between rows, if axis=0 we calculate distances between columns
  • impute (bool) – if impute=True all NaN values in matrix are replaced with 0
Returns:

the matrix with distances between given examples

Return type:

Orange.misc.distmatrix.DistMatrix