186. Multiple Linear Regression
-
Model parameters (unknown — to be estimated)
- : intercept
- , : slope on regressor
- : noise variance
-
Data
- for : one row per observation
-
Output
- : fitted coefficients
- : predicted value
- : residual
186.0.1. Why “least squares”?
Three candidate loss functions for the residuals :
| Loss | What it captures | Verdict |
|---|---|---|
| Bias only | Positive and negative errors cancel — useless for accuracy | |
| Bias + accuracy | No closed-form minimiser — intractable analytically (OK numerically) | |
| Bias + accuracy + smooth | Closed form via calculus ✓ |
OLS picks to minimise the Residual Sum of Squares:
For the univariate case (), setting first-order conditions to zero gives a clean closed form:
For the multivariate case, the same exercise in matrix form yields the normal equations:
where is the design matrix (a column of ones for the intercept, plus one column per regressor).
186.0.2. Goodness of fit ()
Total variation in around its mean splits cleanly into the part the model explains and the part it leaves behind:
The coefficient of determination is the fraction of variation the model explains:
- : model fits perfectly (all residuals zero)
- : model no better than predicting for everyone
Cautions on :
- never decreases when you add a regressor → use adjusted when comparing models of different sizes.
- High does not imply causality — only that variation co-moves.
- Each coefficient still needs its own t-test / confidence interval; the F-test on only says “some coefficient is non-zero”.
Example
Trend + summer dummy on monthly demand (20 months, Jan year 1 — Aug year 2).
Model — level, linear trend, and a summer indicator:
where is a linear time index and flags May–Aug.
Data
| Mo. | Mo. | ||||||
|---|---|---|---|---|---|---|---|
| Jan | 3 025 | 1 | 0 | Nov | 3 499 | 11 | 0 |
| Feb | 3 047 | 2 | 0 | Dec | 3 598 | 12 | 0 |
| Mar | 3 079 | 3 | 0 | Jan | 3 596 | 13 | 0 |
| Apr | 3 136 | 4 | 0 | Feb | 3 721 | 14 | 0 |
| May | 3 454 | 5 | 1 | Mar | 3 745 | 15 | 0 |
| Jun | 3 661 | 6 | 1 | Apr | 3 650 | 16 | 0 |
| Jul | 3 554 | 7 | 1 | May | 4 157 | 17 | 1 |
| Aug | 3 692 | 8 | 1 | Jun | 4 221 | 18 | 1 |
| Sep | 3 407 | 9 | 0 | Jul | 4 238 | 19 | 1 |
| Oct | 3 410 | 10 | 0 | Aug | 4 008 | 20 | 1 |
Fit (e.g. numpy.linalg.lstsq, statsmodels.OLS, or Excel’s regression tool):
Diagnostics
- , adj. , residual standard error
- All three coefficients have (t-stats: intercept , period , summer )
Interpretation
- Intercept (): baseline demand at in a non-summer month
- Period (): underlying trend adds units per month
- Summer (): summer months run units above the trend, holding period fixed
Forecast next month (Sep, , ):
186.0.3. When to reach for OLS
Strengths:
- Coefficients are interpretable — each one quantifies a specific driver
- Confidence intervals, t-tests, and F-tests come for free
- Easy to fold in exogenous regressors (weather, promotions, demographics, …) that pure time-series methods cannot consume
- Plays nicely with categorical predictors via dummy variables
Limitations:
- Treats every observation equally — no down-weighting of stale data the way SES does
- Linear in coefficients (transformations of help, but the structure is rigid)
- Assumes residuals are iid normal; serial correlation in time-series residuals deflates standard errors (fix: Newey–West SEs, ARMA errors, or fall back to ETS)
- Forecasting future requires future — fine when “Period” is the regressor, awkward when it’s “weather”