What are online feature selection methods

"Linear Regression" module

  • 7 minutes to read

This article describes a module in the Azure Machine Learning designer.

Use this module to build a linear regression model for use in a pipeline. Linear regression attempts to establish a linear relationship between one or more independent variables and a numeric result or dependent variable.

This module allows you to define a linear regression method and then train a model on a named dataset. The trained model can then be used to make predictions.

Learn about linear regression

Linear regression is a popular statistical method used in machine learning that has been expanded to include many new ways to fit the line and measure errors. In simple terms, regression is the prediction of a numerical target value. Linear regression is still a great choice when you need a simple model for a simple predictive task. Linear regression also works well on high-dimensional, sparse datasets with little complexity.

In addition to linear regression, Azure Machine Learning supports a variety of regression models. However, the term “regression” can be interpreted loosely, and some types of regression offered in other tools are not supported.

  • The classic regression problem has a single independent variable and a dependent variable. It is also called simple regression model designated. This module supports simple regression.

  • Multiple linear regression includes two or more independent variables that contribute to a single dependent variable. Problems where multiple inputs are used to predict a single numerical result are also known as multivariate linear regression designated.

    The module Linear regression can solve these problems as well as most other regression engines.

  • Multiple label regression is the task of predicting multiple dependent variables in a single model. For example, in logistic regression with several names, a sample can be assigned to several different names. (This is different from the task of predicting multiple levels within a single class variable.)

    This type of regression is not supported in Azure Machine Learning. To predict multiple variables, create a separate learner for each output you want to predict.

For years statisticians have been developing more and more sophisticated methods of regression. This also applies to linear regression. This module supports two methods of measuring error and fitting the regression line: the least squares method and the gradient descent.

  • Gradient descent is a method that minimizes the frequency of errors at every step of the model training process. There are numerous variations of gradient descent, the optimization of which has been extensively studied for various learning problems. If you choose this option for Solution method (Solving Method), you can set a variety of parameters to control step size, learning rate, etc. This option also supports the use of a built-in parameter sweep.

  • Least squares is one of the most common methods used in linear regression. For example, Least Squares is the method used in the Microsoft Excel Analysis Tools tool.

    Least squares refers to the loss function, which calculates the error as the sum of the square of the distance from the actual value to the predicted line and fits the model by minimizing the squared error. This method assumes a close linear relationship between the inputs and the dependent variable.

Configure linear regression

This module supports two methods of fitting a regression model with different options:

Create a regression model using the least squares method

  1. In the designer of your pipeline, add the module Linear regression model added.

    You can find this module in the category Machine learning. Expand Initialize the model, then Regression, and pull out the module Linear regression model into your pipeline.

  2. Select in the area Properties (Properties) in the drop-down list Solution method (Solution method) Ordinary least squares (Least Squares Method). This option specifies the calculation method used to determine the regression line.

  3. Enter under L2 regularization weight (L2 regularization weight) enter the value to be used for weighting the L2 regularization. We recommend using a value other than 0 to avoid an overfitting.

    If you want to learn more about how regularization affects model fitting, see this article: L1 and L2 Regularization for Machine Learning

  4. Activate the option Include intercept term (Include term of intersection) if you want to display the term for intersection.

    Deselect this option if you do not need to check the regression formula.

  5. For Random number seed (random seed) you can optionally enter a value to define a seed for the random number generator used by the model.

    Using a seed is useful when you want to maintain the same results on different runs of the same pipeline. Otherwise, a value from the system clock is used by default.

  6. Add the Train Model module to your pipeline and connect to a named dataset.

  7. Submit the pipeline.

Results for the least squares model

After completing the training:

  • To make predictions, connect the trained model to the Score Model module and to a dataset with new values.

Build a regression model using online gradient descent

  1. In the designer of your pipeline, add the module Linear regression model added.

    You can find this module in the category Machine learning. Expand Initialize the model, then Regression, and pull out the module Linear regression model into your pipeline.

  2. Select in the area Properties (Properties) in the drop-down list Solution method (Solution method) Online Gradient Descent (Online gradient descent) as a calculation method to find the regression line.

  3. Give for Create trainer mode (Create trainer mode) whether you want to train the model with a predefined set of parameters or whether you want to optimize it using a parameter sweep.

    • Single parameter (Single parameter): If you know how to configure the linear regression network, you can supply a specific set of values ​​as arguments.

    • Parameter Range (Parameter area): Select this option if you are not sure which parameters are best and would like to perform a parameter sweep. Select a range of values ​​to iterate over. The Tune Model Hyperparameters module then iterates over all possible combinations of the settings you specified to determine the hyperparameters to produce the best results.

  4. Give for Learning rate (Learning rate) the initial learning rate for the stochastic gradient descent optimization.

  5. Give for Number of training epochs (Number of training epochs) a value that indicates how often the algorithm should iterate through examples. For datasets with a small number of examples, this value should be large to achieve convergence.

  6. Normalize features (Normalize Features): If you have already normalized the numerical data used to train the model, you can turn this option off. By default, the module normalizes all numeric entries to a range from 0 to 1.

    Note

    Remember to apply the same normalization method to new data used for scoring.

  7. Enter under L2 regularization weight (L2 regularization weight) enter the value to be used for weighting the L2 regularization. We recommend using a value other than 0 to avoid an overfitting.

    If you want to learn more about how regularization affects model fitting, see this article: L1 and L2 Regularization for Machine Learning

  8. Activate the option Decrease learning rate (Decrease learning rate) if you want the learning rate to decrease as the iterations progress.

  9. For Random number seed (random seed) you can optionally enter a value to define a seed for the random number generator used by the model. Using a seed is useful when you want to maintain the same results on different runs of the same pipeline.

  10. Train the model:

    • If you Create trainer mode (Create trainer mode) Single parameter (Single Parameter), you need to connect a tagged dataset and the Train Model module.

    • If you Create trainer mode (Create trainer mode) Parameter Range (Parameter Range), connect a tagged dataset, and train the model using Tune Model Hyperparameters.

    Note

    If you pass a parameter range to Train Model, only the default value in the list of individual parameters is used.

    If you pass a single set of specific parameter values ​​to the Tune Model Hyperparameters module and a range of settings is expected for each parameter, the values ​​are ignored and the default values ​​for the learner are used instead.

    If you have the option Parameter Range If you select (Parameter Range) and enter a single value for any parameter, that specified single value will be used throughout the deletion process, even if other parameters in a value range are changed.

  11. Submit the pipeline.

Results for the online gradient descent

After completing the training:

  • To make predictions, connect the trained model to the Score Model module and to new input data.

Next Steps

Take a look at the set of available modules for Azure Machine Learning.

Is this page helpful?