Projection (projection
)¶
PCA¶
Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
Example¶
>>> from Orange.projection import PCA
>>> from Orange.data import Table
>>> iris = Table('iris')
>>> pca = PCA()
>>> model = pca(iris)
>>> model.components_ # PCA components
array([[ 0.36158968, 0.08226889, 0.85657211, 0.35884393],
[ 0.65653988, 0.72971237, 0.1757674 , 0.07470647],
[0.58099728, 0.59641809, 0.07252408, 0.54906091],
[ 0.31725455, 0.32409435, 0.47971899, 0.75112056]])
>>> transformed_data = model(iris) # transformed data
>>> transformed_data
[[2.684, 0.327, 0.022, 0.001  Irissetosa],
[2.715, 0.170, 0.204, 0.100  Irissetosa],
[2.890, 0.137, 0.025, 0.019  Irissetosa],
[2.746, 0.311, 0.038, 0.076  Irissetosa],
[2.729, 0.334, 0.096, 0.063  Irissetosa],
...
]

class
Orange.projection.pca.
PCA
(n_components=None, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', random_state=None, preprocessors=None)[source]¶ A wrapper for Orange.projection.pca.ImprovedPCA. The following is its documentation:
Patch sklearn PCA learner to include randomized PCA for sparse matrices.
Scikitlearn does not currently support sparse matrices at all, even though efficient methods exist for PCA. This class patches the default scikitlearn implementation to properly handle sparse matrices.
Notes
 This should be removed once scikitlearn releases a version which implements this functionality.

class
Orange.projection.pca.
SparsePCA
(n_components=None, alpha=1, ridge_alpha=0.01, max_iter=1000, tol=1e08, method='lars', n_jobs=1, U_init=None, V_init=None, verbose=False, random_state=None, preprocessors=None)[source]¶ A wrapper for sklearn.decomposition.sparse_pca.SparsePCA. The following is its documentation:
Sparse Principal Components Analysis (SparsePCA)
Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.
Read more in the User Guide.

class
Orange.projection.pca.
IncrementalPCA
(n_components=None, whiten=False, copy=True, batch_size=None, preprocessors=None)[source]¶ A wrapper for sklearn.decomposition.incremental_pca.IncrementalPCA. The following is its documentation:
Incremental principal components analysis (IPCA).
Linear dimensionality reduction using Singular Value Decomposition of centered data, keeping only the most significant singular vectors to project the data to a lower dimensional space.
Depending on the size of the input data, this algorithm can be much more memory efficient than a PCA.
This algorithm has constant memory complexity, on the order of
batch_size
, enabling use of np.memmap files without loading the entire file into memory.The computational overhead of each SVD is
O(batch_size * n_features ** 2)
, but only 2 * batch_size samples remain in memory at a time. There will ben_samples / batch_size
SVD computations to get the principal components, versus 1 large SVD of complexityO(n_samples * n_features ** 2)
for PCA.Read more in the User Guide.
FreeViz¶
FreeViz uses a paradigm borrowed from particle physics: points in the same class attract each other, those from different class repel each other, and the resulting forces are exerted on the anchors of the attributes, that is, on unit vectors of each of the dimensional axis. The points cannot move (are projected in the projection space), but the attribute anchors can, so the optimization process is a hillclimbing optimization where at the end the anchors are placed such that forces are in equilibrium.
Example¶
>>> from Orange.projection import FreeViz
>>> from Orange.data import Table
>>> iris = Table('iris')
>>> freeviz = FreeViz()
>>> model = freeviz(iris)
>>> model.components_ # FreeViz components
array([[ 3.83487853e01, 1.38777878e17],
[ 6.95058218e01, 7.18953457e01],
[ 2.16525357e01, 2.65741729e01],
[ 9.50450079e02, 4.53211728e01]])
>>> transformed_data = model(iris) # transformed data
>>> transformed_data
[[0.157, 2.053  Irissetosa],
[0.114, 1.694  Irissetosa],
[0.123, 1.864  Irissetosa],
[0.048, 1.740  Irissetosa],
[0.265, 2.125  Irissetosa],
...
]
LDA¶
Linear discriminant analysis is another way of finding a linear transformation of data that reduces the number of dimensions required to represent it. It is often used for dimensionality reduction prior to classification, but can also be used as a classification technique itself ([1]).
Example¶
>>> from Orange.projection import LDA
>>> from Orange.data import Table
>>> iris = Table('iris')
>>> lda = LDA()
>>> model = LDA(iris)
>>> model.components_ # LDA components
array([[ 0.20490976, 0.38714331, 0.54648218, 0.71378517],
[ 0.00898234, 0.58899857, 0.25428655, 0.76703217],
[0.71507172, 0.43568045, 0.45568731, 0.30200008],
[ 0.06449913, 0.35780501, 0.42514529, 0.828895 ]])
>>> transformed_data = model(iris) # transformed data
>>> transformed_data
[[1.492, 1.905  Irissetosa],
[1.258, 1.608  Irissetosa],
[1.349, 1.750  Irissetosa],
[1.180, 1.639  Irissetosa],
[1.510, 1.963  Irissetosa],
...
]

class
Orange.projection.lda.
LDA
(solver='svd', shrinkage=None, priors=None, n_components=None, store_covariance=False, tol=0.0001, preprocessors=None)[source]¶ A wrapper for sklearn.discriminant_analysis.LinearDiscriminantAnalysis. The following is its documentation:
Linear Discriminant Analysis
A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.
The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.
The fitted model can also be used to reduce the dimensionality of the input by projecting it to the most discriminative directions.
New in version 0.17: LinearDiscriminantAnalysis.
Read more in the User Guide.