Name		Name	Last commit message	Last commit date
parent directory ..
formulas		formulas
README.md		README.md
cost_function.m		cost_function.m
demo.m		demo.m
feature_normalize.m		feature_normalize.m
gradient_descent.m		gradient_descent.m
gradient_step.m		gradient_step.m
house_prices.csv		house_prices.csv
hypothesis.m		hypothesis.m
linear_regression_train.m		linear_regression_train.m
normal_equation.m		normal_equation.m

README.md

Linear Regression

Linear regression is a linear model, e.g. a model that assumes a linear relationship between the input variables (x) and the single output variable (y). More specifically, that output variable (y) can be calculated from a linear combination of the input variables (x).

On the image above there is an example of dependency between input variable x and output variable y. The red line in the above graph is referred to as the best fit straight line. Based on the given data points (training examples), we try to plot a line that models the points the best. In the real world scenario we normally have more than one input variable.

Features (variables)

Each training example consists of features (variables) that describe this example (i.e. number of rooms, the square of the apartment etc.)

n - number of features

Rⁿ⁺¹ - vector of n+1 real numbers

Parameters

Parameters of the hypothesis we want our algorithm to learn in order to be able to do predictions (i.e. predict the price of the apartment).

Hypothesis

The equation that gets features and parameters as an input and predicts the value as an output (i.e. predict the price of the apartment based on its size and number of rooms).

For convenience of notation, define X₀ = 1

Cost Function

Function that shows how accurate the predictions of the hypothesis are with current set of parameters.

xⁱ - input (features) of i^th training example

yⁱ - output of i^th training example

m - number of training examples

Batch Gradient Descent

Gradient descent is an iterative optimization algorithm for finding the minimum of a cost function described above. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point.

Picture below illustrates the steps we take going down of the hill to find local minimum.

The direction of the step is defined by derivative of the cost function in current point.

Once we decided what direction we need to go we need to decide what the size of the step we need to take.

We need to simultaneously update for j = 0, 1, ..., n

- the learning rate, the constant that defines the size of the gradient descent step

- j^th feature value of the i^th training example

- input (features) of i^th training example

yⁱ - output of i^th training example

m - number of training examples

n - number of features

When we use term "batch" for gradient descent it means that each step of gradient descent uses all the training examples (as you might see from the formula above).

Feature Scaling

To make linear regression and gradient descent algorithm work correctly we need to make sure that features are on a similar scale.

For example "apartment size" feature (e.g. 120 m²) is much bigger than the "number of rooms" feature (e.g. 2).

In order to scale the features we need to do mean normalization

- j^th feature value of the i^th training example

- average value of j^th feature in training set

- the range (max - min) of j^th feature in training set.

Polynomial Regression

Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an n^th degree polynomial in x.

Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the hypothesis function is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.

Example of a cubic polynomial regression, which is a type of linear regression.

You may form polynomial regression by adding new polynomial features.

For example if the price of the apartment is in non-linear dependency of its size then you might add several new size-related features.

Normal Equation

There is a closed-form solution to linear regression exists and it looks like the following:

Using this formula does not require any feature scaling, and you will get an exact solution in one calculation: there is no “loop until convergence” like in gradient descent.

Regularization

Overfitting Problem

If we have too many features, the learned hypothesis may fit the training set very well:

But it may fail to generalize to new examples (let's say predict prices on new example of detecting if new messages are spam).

Solution to Overfitting

Here are couple of options that may be addressed:

Reduce the number of features
- Manually select which features to keep
- Model selection algorithm
Regularization
- Keep all the features, but reduce magnitude/values of model parameters (thetas).
- Works well when we have a lot of features, each of which contributes a bit to predicting y.

Regularization works by adding regularization parameter to the cost function:

Note that you should not regularize the parameter .

- regularization parameter

In this case the gradient descent formula will look like the following:

Files

demo.m - linear regression demo script that loads test data and plots linear regression predictions.
linear_regression_train.m - linear regression algorithm.
hypothesis.m - linear regression hypothesis function.
cost_function.m - linear regression cost function.
feature_normalize.m - function that normalizes features.
gradient_descent.m - function that performs gradient descent.
gradient_step.m - function that performs just one gradient descent step.
house_prices.csv - training data set of house prices (depending on number of rooms and house size).
normal_equation.m - function that does alternative calculation of model parameters using normal equation (avoiding gradient descent).

Files

linear-regression

Directory actions

More options

Directory actions

More options

Latest commit

History

linear-regression

Folders and files

parent directory

Linear Regression

Features (variables)

Parameters

Hypothesis

Cost Function

Batch Gradient Descent

Feature Scaling

Polynomial Regression

Normal Equation

Regularization

Overfitting Problem

Solution to Overfitting

Files

Demo visualizations

References