337. Backward Induction
The standard algorithm for solving finite-horizon dynamic programming problems. Computes the optimal cost-to-go from the back () toward the start ().
337.1. Algorithm
For each terminal state s ∈ S:
J[T][s] ← c_T(s)
For t = T-1, T-2, …, 0:
For each state s ∈ S:
J[t][s] ← min over a of { c_t(s,a) + J[t+1][ f_t(s,a) ] }
π[t][s] ← argmin over a of the same expression
Return J[0][s₀] and policy {π[t][s]}
337.2. Why backward
At stage you need to know to evaluate the Bellman equation:
So compute stages in reverse: first (base case), then , then , …, then .
337.3. Complexity
Polynomial in , not exponential. (Naive enumeration is .)
Memory: for the full table, or if you only need and can discard each after using it (but then you lose the policy).
337.4. When backward is the right choice
- Finite horizon with a clean terminal cost
- Computing the optimal value from a known start state
- Producing the optimal action at every reachable state (policy)
- Dense state graphs: most states are reachable
For sparse reachable sets, memoization (top-down recursive DP with caching) avoids computing unreached states.
337.5. Example: Wagner-Whitin lot-sizing
Demand in each period . Setup cost , holding cost per unit per period. Choose periods to place orders to minimize total cost.
State: = the period of the last order (so demand from to is covered by that order).
The zero-inventory property says it’s optimal to order in period iff is a previous “last-order” period: = min total cost given the last order was in period .
Backward induction over states × periods → . This is the classic Wagner-Whitin algorithm — see Wagner-Whitin.
337.6. Tabulation vs memoization
Two implementations of the same logic:
| Tabulation (bottom-up) | Memoization (top-down) |
|---|---|
| Fill full table in order | Recursive call from , cache results |
| Visits every state | Only states actually reached |
| Best for dense reachable sets | Best for sparse reachable sets |
Both compute the same .
337.7. See also
- Bellman Equation
- Forward Induction — the dual approach
- Dynamic Programming