337. Backward Induction

The standard algorithm for solving finite-horizon dynamic programming problems. Computes the optimal cost-to-go from the back () toward the start ().

337.1. Algorithm

For each terminal state s ∈ S:
    J[T][s] ← c_T(s)

For t = T-1, T-2, …, 0:
    For each state s ∈ S:
        J[t][s]  ← min over a of { c_t(s,a) + J[t+1][ f_t(s,a) ] }
        π[t][s]  ← argmin over a of the same expression

Return J[0][s₀] and policy {π[t][s]}

337.2. Why backward

At stage you need to know to evaluate the Bellman equation:

So compute stages in reverse: first (base case), then , then , …, then .

337.3. Complexity

Polynomial in , not exponential. (Naive enumeration is .)

Memory: for the full table, or if you only need and can discard each after using it (but then you lose the policy).

337.4. When backward is the right choice

For sparse reachable sets, memoization (top-down recursive DP with caching) avoids computing unreached states.

337.5. Example: Wagner-Whitin lot-sizing

Demand in each period . Setup cost , holding cost per unit per period. Choose periods to place orders to minimize total cost.

State: = the period of the last order (so demand from to is covered by that order).

The zero-inventory property says it’s optimal to order in period iff is a previous “last-order” period: = min total cost given the last order was in period .

Backward induction over states × periods → . This is the classic Wagner-Whitin algorithm — see Wagner-Whitin.

337.6. Tabulation vs memoization

Two implementations of the same logic:

Tabulation (bottom-up) Memoization (top-down)
Fill full table in order Recursive call from , cache results
Visits every state Only states actually reached
Best for dense reachable sets Best for sparse reachable sets

Both compute the same .

337.7. See also