Regression (regression)

Linear Regression

Linear regression is a statistical regression method which tries to predict a value of a continuous response (class) variable based on the values of several predictors. The model assumes that the response variable is a linear combination of the predictors, the task of linear regression is therefore to fit the unknown coefficients.

Example

>>> from Orange.regression.linear import LinearRegressionLearner
>>> mpg = Orange.data.Table('auto-mpg')
>>> mean_ = LinearRegressionLearner()
>>> model = mean_(mpg[40:110])
>>> print(model)
LinearModel LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
>>> mpg[20]
Value('mpg', 25.0)
>>> model(mpg[0])
Value('mpg', 24.6)
class Orange.regression.linear.LinearRegressionLearner(preprocessors=None)[source]

A wrapper for sklearn.linear_model.base.LinearRegression. The following is its documentation:

Ordinary least squares Linear Regression.

class Orange.regression.linear.RidgeRegressionLearner(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, solver='auto', preprocessors=None)[source]

A wrapper for sklearn.linear_model.ridge.Ridge. The following is its documentation:

Linear least squares with l2 regularization.

This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape [n_samples, n_targets]).

Read more in the User Guide.

class Orange.regression.linear.LassoRegressionLearner(alpha=1.0, fit_intercept=True, normalize=False, precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, preprocessors=None)[source]

A wrapper for sklearn.linear_model.coordinate_descent.Lasso. The following is its documentation:

Linear Model trained with L1 prior as regularizer (aka the Lasso)

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

Technically the Lasso model is optimizing the same objective function as the Elastic Net with l1_ratio=1.0 (no L2 penalty).

Read more in the User Guide.

class Orange.regression.linear.SGDRegressionLearner(loss='squared_loss', penalty='l2', alpha=0.0001, l1_ratio=0.15, fit_intercept=True, n_iter=5, shuffle=True, epsilon=0.1, n_jobs=1, random_state=None, learning_rate='invscaling', eta0=0.01, power_t=0.25, class_weight=None, warm_start=False, average=False, preprocessors=None)[source]

A wrapper for sklearn.linear_model.stochastic_gradient.SGDRegressor. The following is its documentation:

Linear model fitted by minimizing a regularized empirical loss with SGD

SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).

The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.

This implementation works with data represented as dense numpy arrays of floating point values for the features.

Read more in the User Guide.

class Orange.regression.linear.LinearModel(skl_model)[source]

Polynomial

Polynomial model is a wrapper that constructs polynomial features of a specified degree and learns a model on them.

class Orange.regression.linear.PolynomialLearner(learner=LinearRegressionLearner(), degree=2, preprocessors=None)[source]

Generate polynomial features and learn a prediction model

Parameters:

learner : LearnerRegression

learner to be fitted on the transformed features

degree : int

degree of used polynomial

preprocessors : List[Preprocessor]

preprocessors to be applied on the data before learning

Mean

Mean model predicts the same value (usually the distribution mean) for all data instances. Its accuracy can serve as a baseline for other regression models.

The model learner (MeanLearner) computes the mean of the given data or distribution. The model is stored as an instance of MeanModel.

Example

>>> from Orange.data import Table
>>> from Orange.regression import MeanLearner
>>> data = Table('auto-mpg')
>>> learner = MeanLearner()
>>> model = learner(data)
>>> print(model)
MeanModel(23.51457286432161)
>>> model(data[:4])
array([ 23.51457286,  23.51457286,  23.51457286,  23.51457286])
class Orange.regression.MeanLearner(preprocessors=None)[source]

Fit a regression model that returns the average response (class) value.

fit_storage(data)[source]

Construct a MeanModel by computing the mean value of the given data.

Parameters:data (Orange.data.Table) – data table
Returns:regression model, which always returns mean value
Return type:MeanModel

Random Forest

class Orange.regression.RandomForestRegressionLearner(n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, preprocessors=None)[source]

A wrapper for sklearn.ensemble.forest.RandomForestRegressor. The following is its documentation:

A random forest regressor.

A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).

Read more in the User Guide.

Simple Random Forest

class Orange.regression.SimpleRandomForestLearner(n_estimators=10, min_instances=2, max_depth=1024, max_majority=1.0, skip_prob='sqrt', seed=42)[source]

A random forest regressor, optimized for speed. Trees in the forest are constructed with SimpleTreeLearner classification trees.

Parameters:

n_estimators : int, optional (default = 10)

Number of trees in the forest.

min_instances : int, optional (default = 2)

Minimal number of data instances in leaves. When growing the three, new nodes are not introduced if they would result in leaves with fewer instances than min_instances. Instance count is weighed.

max_depth : int, optional (default = 1024)

Maximal depth of tree.

max_majority : float, optional (default = 1.0)

Maximal proportion of majority class. When this is exceeded, induction stops (only used for classification).

skip_prob : string, optional (default = “sqrt”)

Data attribute will be skipped with probability skip_prob.

  • if float, then skip attribute with this probability.
  • if “sqrt”, then skip_prob = 1 - sqrt(n_features) / n_features
  • if “log2”, then skip_prob = 1 - log2(n_features) / n_features

seed : int, optional (default = 42)

Random seed.

Regression Tree

Orange includes two implemenations of regression tres: a home-grown one, and one from scikit-learn. The former properly handles multinominal and missing values, and the latter is faster.

class Orange.regression.TreeLearner(*args, binarize=False, min_samples_leaf=1, min_samples_split=2, max_depth=None, **kwargs)[source]

Tree inducer with proper handling of nominal attributes and binarization.

The inducer can handle missing values of attributes and target. For discrete attributes with more than two possible values, each value can get a separate branch (binarize=False), or values can be grouped into two groups (binarize=True, default).

The tree growth can be limited by the required number of instances for internal nodes and for leafs, and by the maximal depth of the tree.

If the tree is not binary, it can contain zero-branches.

Args:
binarize: if True the inducer will find optimal split into two
subsets for values of discrete attributes. If False (default), each value gets its branch.

min_samples_leaf: the minimal number of data instances in a leaf min_samples_split: the minimal number of data instances that is split

into subgroups

max_depth: the maximal depth of the tree

Returns:
instance of OrangeTreeModel
build_tree(data, active_inst, level=1)[source]

Induce a tree from the given data

Returns:
root node (Node)
class Orange.regression.SklTreeRegressionLearner(criterion='mse', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, max_features=None, random_state=None, max_leaf_nodes=None, preprocessors=None)[source]

A wrapper for sklearn.tree.tree.DecisionTreeRegressor. The following is its documentation:

A decision tree regressor.

Read more in the User Guide.

Neural Network

class Orange.regression.NNRegressionLearner(hidden_layer_sizes=(100, ), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, preprocessors=None)[source]

A wrapper for sklearn.neural_network.multilayer_perceptron.MLPRegressor. The following is its documentation:

Multi-layer Perceptron regressor.

This model optimizes the squared-loss using LBFGS or stochastic gradient descent.

New in version 0.18.