336. Bellman Equation
The recursion at the heart of dynamic programming. Defines the optimal cost-to-go — minimum cost to complete the problem from stage in state — as a function of cost-to-go from the next stage.
336.1. Principle of optimality (Bellman, 1957)
> “An optimal policy has the property that whatever the initial state and initial decision, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.”
Restated: if is optimal from , then is optimal from .
Why true (contradiction): if the suffix weren’t optimal, swap it for the better one → strictly lower total cost → contradicts optimality of the full plan. ∎
336.2. Deriving the recursion
The total cost decomposes:
The bracketed remainder is the same problem starting from . By the principle of optimality, the optimal remainder is — the optimal cost-to-go from .
So at stage in state , you pay now and face the optimal-remaining problem from :
This is the Bellman equation — direct consequence of additivity + principle of optimality.
336.3. Base case
(no decisions left; the only cost is the terminal one.)
336.4. Optimal action
The optimal total cost from is .
336.5. Worked example: shortest path through a layered graph
Stages , states . Edge costs:
| from\to | A | B | C | D | G |
|---|---|---|---|---|---|
| S | 4 | 2 | |||
| A | 5 | 3 | |||
| B | 1 | 6 | |||
| C | 2 | ||||
| D | 4 |
Stage 3 (base case): .
Stage 2:
Stage 1:
- (tie)
- , action: go to
Stage 0:
- , action: go to
Optimal path: , cost .
336.6. Two variants
- Backward induction: solve .
- Forward induction: dual recursion on cost-to-arrive .
Both compute the same optimum — choose whichever fits the data flow.
336.7. Stochastic case
When transitions are random (), the Bellman equation becomes:
— expectation over the next state replaces the deterministic transition. See Stochastic DP.
336.8. See also
- Dynamic Programming — the broader framework
- Backward Induction — algorithm
- Stochastic DP — random case