Regression

342. Regression

342.1. Linear Regression

342.1.1. Simple Linear Regression

342.1.1.1. Data and Model

We observe a dataset of samples:

We assume a linear relationship between and :

Where:

: intercept
: slope
: error term (noise)

342.1.1.2. Optimization Problem

The parameters and are estimated by minimizing the sum of square errors (SSE)

This is an unconstrained convex optimization problem

342.1.1.3. Gradient and Hessian

Compute the partial derivatives:

Thus, the gradient is:

and the Hessian matrix is:

342.1.1.4. Convexity

The Hessian is constant (does not depend on , ) and symmetric. We check positive semidefiniteness via the determinant:

Sincle this expression is nonnegative for all , is convex

342.1.1.5. Optimality Conditions

Setting the gradient equal to zero gives the normal equaltions:

Solving this linear system yields the closed-form solution:

where:

342.1.1.6. Convexity by Decomposition

Alternatively, decompose as:

Expanding :

Each is a quadratic function in (, )

Each has Hessian:

Since each is positive semidefinite, and sums of convex functions are convex, is convex

Example

342.1.2. Multiple Linear Regression

342.1.2.1. Data and Model

For samples and predictors:

The model is:

342.1.2.2. Optimization Problem

where

In matrix form, letting be the data matrix, , and the column of ones:

The closed-form solution is:

provided is invertible

342.1.3. Regularization

Regularization penalizes large coefficients to prevent overfitting and improve numerical stability.

Let be a regularization parameter.

342.1.3.1. Ridge Regression (L2 Regularization)

In matrix form:

Closed-form solution (for centered data):

Ridge regression shrinks coefficients toward zero but does not set them exactly to zero.

342.1.3.2. LASSO Regression (L1 Regularization)

LASSO performs feature selection because some coefficients can become exactly zero.

This problem is still convex but non-differentiable due to the absolute value term.

Example