186. Multiple Linear Regression

Causal vs. extrapolative. Time-series methods (SMA, ETS, ARIMA) build a forecast from the demand series alone — extracting level, trend, and seasonality from . Causal models flip the framing: demand is driven by exogenous regressors (price, weather, promotions, demographics, day-of-week, …) and we fit a relationship between them.

186.0.1. Why “least squares”?

Three candidate loss functions for the residuals :

Loss What it captures Verdict
Bias only Positive and negative errors cancel — useless for accuracy
Bias + accuracy No closed-form minimiser — intractable analytically (OK numerically)
Bias + accuracy + smooth Closed form via calculus

OLS picks to minimise the Residual Sum of Squares:

For the univariate case (), setting first-order conditions to zero gives a clean closed form:

For the multivariate case, the same exercise in matrix form yields the normal equations:

where is the design matrix (a column of ones for the intercept, plus one column per regressor).

186.0.2. Goodness of fit ()

Total variation in around its mean splits cleanly into the part the model explains and the part it leaves behind:

The coefficient of determination is the fraction of variation the model explains:

Cautions on :

  • never decreases when you add a regressor → use adjusted when comparing models of different sizes.
  • High does not imply causality — only that variation co-moves.
  • Each coefficient still needs its own t-test / confidence interval; the F-test on only says “some coefficient is non-zero”.
Example

Trend + summer dummy on monthly demand (20 months, Jan year 1 — Aug year 2).

Model — level, linear trend, and a summer indicator:

where is a linear time index and flags May–Aug.

Data

Mo. Mo.
Jan 3 025 1 0 Nov 3 499 11 0
Feb 3 047 2 0 Dec 3 598 12 0
Mar 3 079 3 0 Jan 3 596 13 0
Apr 3 136 4 0 Feb 3 721 14 0
May 3 454 5 1 Mar 3 745 15 0
Jun 3 661 6 1 Apr 3 650 16 0
Jul 3 554 7 1 May 4 157 17 1
Aug 3 692 8 1 Jun 4 221 18 1
Sep 3 407 9 0 Jul 4 238 19 1
Oct 3 410 10 0 Aug 4 008 20 1

Fit (e.g. numpy.linalg.lstsq, statsmodels.OLS, or Excel’s regression tool):

Diagnostics

  • , adj. , residual standard error
  • All three coefficients have (t-stats: intercept , period , summer )

Interpretation

  • Intercept (): baseline demand at in a non-summer month
  • Period (): underlying trend adds units per month
  • Summer (): summer months run units above the trend, holding period fixed

Forecast next month (Sep, , ):

186.0.3. When to reach for OLS

Strengths:

Limitations: