335. Dynamic Programming

A method for solving multi-stage decision problems by breaking them into overlapping sub-problems, each solved once and reused.

335.1. When DP applies

Required ingredients:

  1. Stages — the problem decomposes into a sequence of decisions
  2. State — a summary of the past sufficient for future decisions (Markov property)
  3. Additive cost — total cost = sum of per-stage costs
  4. Known transition deterministic, or stochastic
  5. Tractable state set — table fits in memory (or use approximation)

If any breaks, plain DP doesn’t apply.

335.2. Problem class

At stage in state :

After the last decision: terminal cost .

Total cost of a plan :

Goal: .

335.3. Why naive enumeration fails

actions per stage, stages → plans. . DP shrinks this to via the Bellman recursion.

335.4. Core ideas

335.5. Curse of dimensionality

State has components with values each → . DP cost grows exponentially in dimension, not in .

Mitigations: state aggregation, approximate DP, reinforcement learning, function approximation.

335.6. Common applications

335.7. See also