342. Regression
342.1. Linear Regression
342.1.1. Simple Linear Regression
342.1.1.1. Data and Model
We observe a dataset of samples:
We assume a linear relationship between and :
Where:
- : intercept
- : slope
- : error term (noise)
342.1.1.2. Optimization Problem
The parameters and are estimated by minimizing the sum of square errors (SSE)
This is an unconstrained convex optimization problem
342.1.1.3. Gradient and Hessian
Compute the partial derivatives:
Thus, the gradient is:
and the Hessian matrix is:
342.1.1.4. Convexity
The Hessian is constant (does not depend on , ) and symmetric. We check positive semidefiniteness via the determinant:
Sincle this expression is nonnegative for all , is convex
342.1.1.5. Optimality Conditions
Setting the gradient equal to zero gives the normal equaltions:
Solving this linear system yields the closed-form solution:
where:
342.1.1.6. Convexity by Decomposition
Alternatively, decompose as:
Expanding :
Each is a quadratic function in (, )
Each has Hessian:
Since each is positive semidefinite, and sums of convex functions are convex, is convex
Example
342.1.2. Multiple Linear Regression
342.1.2.1. Data and Model
For samples and predictors:
The model is:
342.1.2.2. Optimization Problem
where
In matrix form, letting be the data matrix, , and the column of ones:
The closed-form solution is:
provided is invertible
342.1.3. Regularization
Regularization penalizes large coefficients to prevent overfitting and improve numerical stability.
Let be a regularization parameter.
342.1.3.1. Ridge Regression (L2 Regularization)
In matrix form:
Closed-form solution (for centered data):
Ridge regression shrinks coefficients toward zero but does not set them exactly to zero.
342.1.3.2. LASSO Regression (L1 Regularization)
LASSO performs feature selection because some coefficients can become exactly zero.
This problem is still convex but non-differentiable due to the absolute value term.