Dynamic Programming

335. Dynamic Programming

A method for solving multi-stage decision problems by breaking them into overlapping sub-problems, each solved once and reused.

Required ingredients:

Stages — the problem decomposes into a sequence of decisions
State — a summary of the past sufficient for future decisions (Markov property)
Additive cost — total cost = sum of per-stage costs
Known transition — deterministic, or stochastic
Tractable state set — table fits in memory (or use approximation)

If any breaks, plain DP doesn’t apply.

At stage in state :

After the last decision: terminal cost .

Total cost of a plan :

Goal: .

actions per stage, stages → plans. . DP shrinks this to via the Bellman recursion.

State has components with values each → . DP cost grows exponentially in dimension, not in .

Mitigations: state aggregation, approximate DP, reinforcement learning, function approximation.