335. Dynamic Programming
A method for solving multi-stage decision problems by breaking them into overlapping sub-problems, each solved once and reused.
335.1. When DP applies
Required ingredients:
- Stages — the problem decomposes into a sequence of decisions
- State — a summary of the past sufficient for future decisions (Markov property)
- Additive cost — total cost = sum of per-stage costs
- Known transition — deterministic, or stochastic
- Tractable state set — table fits in memory (or use approximation)
If any breaks, plain DP doesn’t apply.
335.2. Problem class
At stage in state :
- Choose action
- Pay immediate cost
- System transitions to
After the last decision: terminal cost .
Total cost of a plan :
Goal: .
335.3. Why naive enumeration fails
actions per stage, stages → plans. . DP shrinks this to via the Bellman recursion.
335.4. Core ideas
- Bellman equation — the recursion that breaks the problem into sub-problems
- Backward induction — solve from the end back
- Forward induction — dual direction (cost-to-arrive)
- Stochastic DP — random transitions, expected cost
- Value iteration / Policy iteration — infinite-horizon MDPs
335.5. Curse of dimensionality
State has components with values each → . DP cost grows exponentially in dimension, not in .
Mitigations: state aggregation, approximate DP, reinforcement learning, function approximation.
335.6. Common applications
- Shortest path on DAG —
- 0-1 Knapsack — pseudo-polynomial
- Equipment replacement
- Inventory: Wagner-Whitin lot-sizing
- Resource allocation across activities
- Optimal stopping — secretary problem, asset selling
- Revenue management — see Bellman applied to capacity-time
- Reinforcement learning — MDP foundation