This is documentation for Orange 2.7. For the latest documentation, see Orange 3.

Lasso regression (lasso)

The lasso (least absolute shrinkage and selection operator) is a regularized version of least squares regression. It minimizes the sum of squared errors while also penalizing the L_1 norm (sum of absolute values) of the coefficients.

Concretely, the function that is minimized in Orange is:

\frac{1}{n}\|Xw - y\|_2^2 + \frac{\lambda}{m} \|w\|_1

Where X is a n \times m data matrix, y the vector of class values and w the regression coefficients to be estimated.

class Orange.regression.lasso.LassoRegressionLearner(lasso_lambda=0.1, max_iter=20000, eps=1e-06, n_boot=0, n_perm=0, imputer=None, continuizer=None, name=Lasso)

Bases: Orange.regression.base.BaseRegressionLearner

Fits the lasso regression model using FISTA (Fast Iterative Shrinkage-Thresholding Algorithm).

__call__(data, weight=None)
Parameters:
  • data (Orange.data.Table) – Training data.
  • weight – Weights for instances. Not implemented yet.
__init__(lasso_lambda=0.1, max_iter=20000, eps=1e-06, n_boot=0, n_perm=0, imputer=None, continuizer=None, name=Lasso)
Parameters:
  • lasso_lambda (float) – Regularization parameter.
  • max_iter (int) – Maximum number of iterations for the optimization method.
  • eps (float) – Stop optimization when improvements are lower than eps.
  • n_boot (int) – Number of bootstrap samples used for non-parametric estimation of standard errors.
  • n_perm (int) – Number of permuations used for non-parametric estimation of p-values.
  • name (str) – Learner name.
fista(X, y, l, lipschitz, w_init=None)

Fast Iterative Shrinkage-Thresholding Algorithm (FISTA).

get_lipschitz(X)

Return the Lipschitz constant of \nabla f, where f(w) = \frac{1}{2}||Xw-y||^2.

class Orange.regression.lasso.LassoRegression(domain=None, class_var=None, coef0=None, coefficients=None, std_errors=None, p_vals=None, model=None, mu_x=None)

Bases: Orange.classification.Classifier

Lasso regression predicts the value of the response variable based on the values of independent variables.

coef0

Intercept (sample mean of the response variable).

coefficients

Regression coefficients.

std_errors

Standard errors of coefficient estimates for a fixed regularization parameter. The standard errors are estimated using the bootstrapping method.

p_vals

List of p-values for the null hypotheses that the regression coefficients equal 0 based on a non-parametric permutation test.

model

Dictionary with the statistical properties of the model: Keys - names of the independent variables Values - tuples (coefficient, standard error, p-value)

mu_x

Sample mean of independent variables.

__call__(instance, result_type=0)
Parameters:instance (Orange.data.Instance) – Data instance for which the value of the response variable will be predicted.
to_string(skip_zero=True)

Pretty-prints a lasso regression model, i.e. estimated regression coefficients with standard errors and significances. Standard errors are obtained using the bootstrapping method and significances by a permuation test.

Parameters:skip_zero (bool) – If True, variables with estimated coefficient equal to 0 are omitted.

Utility functions

Orange.regression.lasso.get_bootstrap_sample(data)

Generate a bootstrap sample of a given data set.

Parameters:data (Orange.data.Table) – the original data sample
Orange.regression.lasso.permute_responses(data)

Permute values of the class (response) variable. The independence between independent variables and the response is obtained but the distribution of the response variable is kept.

Parameters:data (Orange.data.Table) – Original data.

Examples

To fit the regression parameters on housing data set use the following code:

housing = Orange.data.Table("housing")
learner = Orange.regression.lasso.LassoRegressionLearner(
    lasso_lambda=1, n_boot=100, n_perm=100)

To predict values of the response for the first five instances:

for ins in housing[:5]:
    print "Actual: %3.2f, predicted: %3.2f" % (

Output:

Actual: 24.00, predicted: 30.45
Actual: 21.60, predicted: 25.60
Actual: 34.70, predicted: 31.48
Actual: 33.40, predicted: 30.18
Actual: 36.20, predicted: 29.59

To see the fitted regression coefficients, print the model:

print classifier

Output:

  Variable  Coeff Est  Std Error          p
 Intercept     22.533
      CRIM     -0.023      0.024      0.050     .
      CHAS      1.970      1.331      0.040     *
       NOX     -4.226      2.944      0.010     *
        RM      4.270      0.934      0.000   ***
       DIS     -0.373      0.170      0.010     *
   PTRATIO     -0.798      0.117      0.000   ***
         B      0.007      0.003      0.020     *
     LSTAT     -0.519      0.102      0.000   ***
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1 empty 1

For 5 variables the regression coefficient equals 0:
ZN, INDUS, AGE, RAD, TAX

Note that some of the regression coefficients are equal to 0.