This is documentation for Orange 2.7. For the latest documentation, see Orange 3.

Linear regression (linear)

Linear regression is a statistical regression method which tries to predict a value of a continuous response (class) variable based on the values of several predictors. The model assumes that the response variable is a linear combination of the predictors, the task of linear regression is therefore to fit the unknown coefficients.

To fit the regression parameters on housing data set use the following code:

import Orange
housing = Orange.data.Table("housing")
learner = Orange.regression.linear.LinearRegressionLearner()
classifier = learner(housing)
class Orange.regression.linear.LinearRegressionLearner(name=linear regression, intercept=True, compute_stats=True, ridge_lambda=None, imputer=None, continuizer=None, use_vars=None, stepwise=False, add_sig=0.05, remove_sig=0.2, **kwds)

Fits the linear regression model, i.e. learns the regression parameters The class is derived from Orange.regression.base.BaseRegressionLearner which is used for preprocessing the data (continuization and imputation) before fitting the regression parameters.

__call__(table, weight=None, verbose=0)
Parameters:
  • table (Orange.data.Table) – data instances.
  • weight (None or list of Orange.feature.Continuous which stores weights for instances) – the weights for instances. Default: None, i.e. all data instances are equally important in fitting the regression parameters
__init__(name=linear regression, intercept=True, compute_stats=True, ridge_lambda=None, imputer=None, continuizer=None, use_vars=None, stepwise=False, add_sig=0.05, remove_sig=0.2, **kwds)
Parameters:
  • name (string) – name of the linear model, default ‘linear regression’
  • intercept (bool) – if True, the intercept beta0 is included in the model
  • compute_stats (bool) – if True, statistical properties of the estimators (standard error, t-scores, significances) and statistical properties of the model (sum of squares, R2, adjusted R2) are computed
  • ridge_lambda (int or None) – if not None, ridge regression is performed with the given lambda parameter controlling the regularization
  • use_vars (list of Orange.feature.Descriptor or None) – the list of independent varaiables included in regression model. If None (default) all variables are used
  • stepwise (bool) – if True, stepwise regression based on F-test is performed. The significance parameters are add_sig and remove_sig
  • add_sig (float) – lower bound of significance for which the variable is included in regression model default value = 0.05
  • remove_sig (float) – upper bound of significance for which the variable is excluded from the regression model default value = 0.2
class Orange.regression.linear.LinearRegression(class_var=None, domain=None, coefficients=None, F=None, std_error=None, t_scores=None, p_vals=None, dict_model=None, fitted=None, residuals=None, m=None, n=None, mu_y=None, r2=None, r2adj=None, sst=None, sse=None, ssr=None, std_coefficients=None, intercept=None)

Linear regression predicts value of the response variable based on the values of independent variables.

F

F-statistics of the model.

coefficients

Regression coefficients stored in list. If the intercept is included the first item corresponds to the estimated intercept.

std_error

Standard errors of the coefficient estimator, stored in list.

t_scores

List of t-scores for the estimated regression coefficients.

p_vals

List of p-values for the null hypothesis that the regression coefficients equal 0 based on t-scores and two sided alternative hypothesis.

dict_model

Statistical properties of the model in a dictionary: Keys - names of the independent variables (or “Intercept”) Values - tuples (coefficient, standard error, t-value, p-value)

fitted

Estimated values of the dependent variable for all instances from the training table.

residuals

Differences between estimated and actual values of the dependent variable for all instances from the training table.

m

Number of independent (predictor) variables.

n

Number of instances.

mu_y

Sample mean of the dependent variable.

r2

Coefficient of determination.

r2adj

Adjusted coefficient of determination.

sst, sse, ssr

Total sum of squares, explained sum of squares and residual sum of squares respectively.

std_coefficients

Standardized regression coefficients.

__call__(instance, result_type=0)
Parameters:instance (Instance) – data instance for which the value of the response variable will be predicted
__init__(class_var=None, domain=None, coefficients=None, F=None, std_error=None, t_scores=None, p_vals=None, dict_model=None, fitted=None, residuals=None, m=None, n=None, mu_y=None, r2=None, r2adj=None, sst=None, sse=None, ssr=None, std_coefficients=None, intercept=None)
Parameters:model (LinearRegressionLearner) – fitted linear regression model
to_string()

Pretty-prints linear regression model, i.e. estimated regression coefficients with standard errors, t-scores and significances.

Utility functions

Orange.regression.linear.stepwise(table, weight, add_sig=0.05, remove_sig=0.2)

Performs stepwise linear regression: on table and returns the list of remaing independent variables which fit a significant linear regression model.coefficients

Parameters:
  • table (Orange.data.Table) – data instances.
  • weight (None or list of Orange.feature.Continuous which stores the weights) – the weights for instances. Default: None, i.e. all data instances are eqaully important in fitting the regression parameters
  • add_sig (float) – lower bound of significance for which the variable is included in regression model default value = 0.05
  • remove_sig (float) – upper bound of significance for which the variable is excluded from the regression model default value = 0.2

Examples

Prediction

Predict values of the first 5 data instances

# prediction for five data instances and comparison to actual values
for ins in housing[:5]:
    print "Actual: %3.2f, predicted: %3.2f " % (ins.get_class(), classifier(ins))

The output of this code is

Actual: 24.00, predicted: 30.00
Actual: 21.60, predicted: 25.03
Actual: 34.70, predicted: 30.57
Actual: 33.40, predicted: 28.61
Actual: 36.20, predicted: 27.94

Poperties of fitted model

Print regression coefficients with standard errors, t-scores, p-values and significances

print classifier

The code output is

 Variable  Coeff Est  Std Error    t-value          p      
Intercept     36.459      5.103      7.144      0.000   ***
     CRIM     -0.108      0.033     -3.287      0.001    **
       ZN      0.046      0.014      3.382      0.001   ***
    INDUS      0.021      0.061      0.334      0.738      
     CHAS      2.687      0.862      3.118      0.002    **
      NOX    -17.767      3.820     -4.651      0.000   ***
       RM      3.810      0.418      9.116      0.000   ***
      AGE      0.001      0.013      0.052      0.958      
      DIS     -1.476      0.199     -7.398      0.000   ***
      RAD      0.306      0.066      4.613      0.000   ***
      TAX     -0.012      0.004     -3.280      0.001    **
  PTRATIO     -0.953      0.131     -7.283      0.000   ***
        B      0.009      0.003      3.467      0.001   ***
    LSTAT     -0.525      0.051    -10.347      0.000   ***

Stepwise regression

To use stepwise regression initialize learner with stepwise=True. The upper and lower bound for significance are controlled with add_sig and remove_sig.

learner2 = Orange.regression.linear.LinearRegressionLearner(stepwise=True,
                                                           add_sig=0.05,
                                                           remove_sig=0.2)
classifier = learner2(housing)
print classifier

As you can see from the output, the non-significant coefficients have been removed from the model.

Variable  Coeff Est  Std Error    t-value          p
 Intercept     36.341      5.067      7.171      0.000   ***
     LSTAT     -0.523      0.047    -11.019      0.000   ***
        RM      3.802      0.406      9.356      0.000   ***
   PTRATIO     -0.947      0.129     -7.334      0.000   ***
       DIS     -1.493      0.186     -8.037      0.000   ***
       NOX    -17.376      3.535     -4.915      0.000   ***
      CHAS      2.719      0.854      3.183      0.002    **
         B      0.009      0.003      3.475      0.001   ***
        ZN      0.046      0.014      3.390      0.001   ***
      CRIM     -0.108      0.033     -3.307      0.001    **
       RAD      0.300      0.063      4.726      0.000   ***
       TAX     -0.012      0.003     -3.493      0.001   ***
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1 empty 1