277. Validation
Tests for confirming that a system-dynamics model represents reality usefully. Sterman (2000, Ch. 21) gives a canonical 12-test battery, divided into structure-oriented and behavior-oriented tests.
277.1. Structure-oriented tests (does the model capture what it should?)
- Boundary adequacy: are the right things inside vs outside the model?
- Structure assessment: do the relationships match real-world causal structure?
- Dimensional consistency: units balance in every equation
- Parameter assessment: parameter values consistent with literature and expert estimates
- Extreme conditions test: behavior at extremes (zero inventory, infinite capacity) sensible
- Integration error test: refine ; if behavior changes, integrator is too coarse
277.2. Behavior-oriented tests (does the model produce realistic output?)
- Behavior reproduction: match historical / observed data
- Behavior anomaly: if you simulate a known anomaly (delay, shock), the model exhibits it
- Family member: same structure should explain same phenomena across similar systems
- Surprise behavior: model should reveal new dynamic patterns not built-in
- Sensitivity analysis: behavior changes “reasonably” with parameter perturbations
- System improvement: model leads to useful policy insight
277.3. Quantitative fit: Theil’s U decomposition
For predicted vs actual :
Decomposes MSE into three components:
- (bias): systematic mean error
- (variance): scale of variability
- (covariance): direction / phase
. Ideal model: and small, large — model captures direction even if magnitudes slightly off.
277.4. Behavior modes matter more than point fits
For SD models, qualitative behavior (does it overshoot? oscillate? saturate?) is usually more valuable than precise fit. A model that predicts the right pattern of bullwhip with the wrong amplitude is better than one with right amplitude but no pattern.
Sterman emphasizes: don’t fixate on RMSE; check that the behavior modes match.
277.5. Calibration methods
For data-fitting:
- FIMLOF (Full Information Maximum Likelihood with Optimal Filtering): rigorous statistical calibration including Kalman-filter-style observation noise
- Vensim Powell search: gradient-based optimization on RMSE
- Bayesian calibration: posterior distributions over parameters given data
For decision-rule parameters (e.g., beer-game ): typically estimated by least-squares from gameplay data.
277.6. Common pitfalls
- Over-fitting to historical data — many degrees of freedom in an SD model
- Ignoring extreme tests — model “fits” but breaks at boundaries
- Skipping unit checks — about half of model bugs are unit errors
- Confidence over-reach — SD models are insightful, not predictive in the forecasting sense
277.7. Sensitivity analysis types
- Univariate: vary one parameter at a time; identify high-leverage variables
- Monte Carlo / Latin Hypercube: sample parameter combinations; build envelopes of model behavior
- Tornado diagram: rank parameters by impact on a metric
277.8. See also
- System Dynamics overview
- Numerical Integration — integration-error test
- Monte Carlo Simulation — for sensitivity