Projection (projection)

PCA

Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

Example

>>> from Orange.projection.pca import PCA
>>> iris = Orange.data.Table('iris')
>>> pca = PCA()
>>> model = PCA(iris)
>>> model.components_    # PCA components
array([[ 0.36158968, -0.08226889,  0.85657211,  0.35884393],
    [ 0.65653988,  0.72971237, -0.1757674 , -0.07470647],
    [-0.58099728,  0.59641809,  0.07252408,  0.54906091],
    [ 0.31725455, -0.32409435, -0.47971899,  0.75112056]])
>>> transformed_data = model(iris)    # transformed data
>>> transformed_data
[[-2.684, 0.327, -0.022, 0.001 | Iris-setosa],
[-2.715, -0.170, -0.204, 0.100 | Iris-setosa],
[-2.890, -0.137, 0.025, 0.019 | Iris-setosa],
[-2.746, -0.311, 0.038, -0.076 | Iris-setosa],
[-2.729, 0.334, 0.096, -0.063 | Iris-setosa],
...
]
class Orange.projection.pca.PCA(n_components=None, copy=True, whiten=False, svd_solver=’auto’, tol=0.0, iterated_power=’auto’, random_state=None, preprocessors=None)[source]

A wrapper for sklearn.decomposition.pca.PCA. The following is its documentation:

Principal component analysis (PCA)

Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.

It uses the LAPACK implementation of the full SVD or a randomized truncated SVD by the method of Halko et al. 2009, depending on the shape of the input data and the number of components to extract.

It can also use the scipy.sparse.linalg ARPACK implementation of the truncated SVD.

Notice that this class does not support sparse input. See TruncatedSVD for an alternative with sparse data.

Read more in the User Guide.

class Orange.projection.pca.SparsePCA(n_components=None, alpha=1, ridge_alpha=0.01, max_iter=1000, tol=1e-08, method=’lars’, n_jobs=1, U_init=None, V_init=None, verbose=False, random_state=None, preprocessors=None)[source]

A wrapper for sklearn.decomposition.sparse_pca.SparsePCA. The following is its documentation:

Sparse Principal Components Analysis (SparsePCA)

Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.

Read more in the User Guide.

class Orange.projection.pca.IncrementalPCA(n_components=None, whiten=False, copy=True, batch_size=None, preprocessors=None)[source]

A wrapper for sklearn.decomposition.incremental_pca.IncrementalPCA. The following is its documentation:

Incremental principal components analysis (IPCA).

Linear dimensionality reduction using Singular Value Decomposition of centered data, keeping only the most significant singular vectors to project the data to a lower dimensional space.

Depending on the size of the input data, this algorithm can be much more memory efficient than a PCA.

This algorithm has constant memory complexity, on the order of batch_size, enabling use of np.memmap files without loading the entire file into memory.

The computational overhead of each SVD is O(batch_size * n_features ** 2), but only 2 * batch_size samples remain in memory at a time. There will be n_samples / batch_size SVD computations to get the principal components, versus 1 large SVD of complexity O(n_samples * n_features ** 2) for PCA.

Read more in the User Guide.