209. Cumulative
Average of every observation since the start of the series. The expanding-window endpoint of the moving-averages axis.
Forecast: — predict the long-run average.
209.0.1. Behavior
The cumulative average can also be written recursively:
The second form makes the dynamics clear:
- New observation is compared to the running mean .
- The error is added back, scaled by .
So the effective smoothing parameter of the cumulative average is — decreasing over time. As more data arrives, the cumulative average becomes increasingly insensitive to new observations.
209.0.2. Comparison to other moving averages
- Naïve (, ): only most-recent matters; ignores history.
- SMA (fixed ): finite window; weight per observation is .
- SES (fixed ): infinite window; weight on most recent is , decaying geometrically.
- Cumulative (, ): infinite window with equal weights.
The relationship between , , and effective memory:
- for SMA and an “average age” perspective on SES.
- For the cumulative average, as — effectively infinite memory.
209.0.3. When to use
- Estimating a long-run mean of a stationary process: cumulative converges to the true mean by the law of large numbers.
- Slow baseline in regime detection or anomaly detection: a rapid moving average can be compared against the cumulative average to detect drift or shifts.
- Online statistics: classical “running mean” computed in a single pass.
Don’t use it as a forecast for a non-stationary series. Cumulative gives equal weight to data from years ago and yesterday — useless if the level has shifted.
209.0.4. Connection to MLE
The cumulative average is the maximum likelihood estimator of the population mean for an i.i.d. process. So if you genuinely believe the data are i.i.d. with constant mean, cumulative average is optimal — best you can do.
If you don’t believe i.i.d. (which is true of almost all real time series), use SES, ETS, or ARIMA instead.
Example
Given:
| 1 | 2 | 3 | 4 | 5 | 6 | |
| 10 | 20 | 30 | 20 | 12 | 24 |
Iterate:
| 1 | 10 | 10/1 = 10.00 | 1.00 |
| 2 | 20 | (10+20)/2 = 15.00 | 0.50 |
| 3 | 30 | (10+20+30)/3 = 20.00 | 0.33 |
| 4 | 20 | (10+20+30+20)/4 = 20.00 | 0.25 |
| 5 | 12 | (10+20+30+20+12)/5 = 18.40 | 0.20 |
| 6 | 24 | (10+20+30+20+12+24)/6 = 19.33 | 0.17 |
Notice:
- stabilizes around 19–20 even as new data wiggles around.
- The effective shrinks: by , the cumulative average reacts to a new observation with weight only 0.17. By , weight 0.01 — essentially frozen.
- This means the cumulative average is a late indicator. If the underlying mean shifts at , it’ll take many observations before catches up.