Framing Practicum: Dynamic Programming
This is a framing practicum post. We’ll talk about what dynamic programming (DP) is, how to recognize DP in the wild, and what questions to ask when you find it. Then, we’ll have a challenge to apply the idea.
Today’s challenge: come up with 3 examples of DP which do not resemble any you’ve seen before. They don’t need to be good, they don’t need to be useful, they just need to be novel (to you).
Expected time: ~15-30 minutes at most, including the Bonus Exercise.
What is DP?
Suppose I am about to drive from Miami to Boston and I need to get to Boston as fast as possible. As a first step, I check the highway map and create a list of possible routes for this trip (let’s assume “good” old times with no Google maps). For instance, looking at the imaginary routes in the figure below, the route at the top says I should take the “Miami → A → C → E → G → Boston” route and the total trip distance would be 200+170+220+400+430=1420 miles. Which specific route should I take to minimize the total distance, thus total travel time? I can, of course, calculate total travel distance for each possible route and pick the one with least-distance. But it could easily get very time consuming if there exist hundreds of thousands of possible routes to evaluate.
One alternative approach to identify a route with minimum distance is to use a backward method. Suppose I drive backward through the route map from right to left. First, I will start at the destination, Boston. If I am in city G or H, I have no further decision to make since there is only one route that leads me to Boston from either city. The number in green summarizes total distance with one last trip, or one stage, to go.
My first decision (from right to left) occurs with two trips, or stages to go. If, for example, I am in city F, I can either drive 150 miles to city G and another 430 miles to Boston—total 580 miles, or drive 200 miles to city H and another 400 miles to Boston—total 600 miles. Therefore, the shortest possible distance, or optimal route (in green), from city F is 580 miles (F → G → Boston). The alternative route from F (F → H → Boston) is suboptimal with 600 miles to go (in red).
Let me back up one more city, or stage, and compute the least-distance from city C and D to Boston. Figure below summarizes these calculations.
Once I have computed the optimal route from city C and D onward to Boston, I can again move back one city and determine the optimal route from city A and B onward.
I continue this process and will end up with an optimal route with least-distance of 1080 miles to the problem (highlighted in bold arrows):
This is DP: An agent faces a multistage optimization problem (travelling from Miami to Boston by travelling through multiple cities). At each stage (e.g., I have one more trip to go to Boston), the agent might be in a different state (e.g., I am currently in city A or city B). According to the current state (e.g., I am in city A), the agent takes a specific action (e.g., I will drive to city C) and as a result of that action, the system transitions to a different state in the next stage (e.g., now I am in city C). We solve the multistage optimization problem by working backwards, at each step computing the best reward we can get from that stage onward by considering each of the possible “next/child” stages.
What To Look For?
DP should come to mind whenever an agent faces a problem with multi-stages nature and the agent takes a series of actions. Another defining feature of DP is that the original multi-stage complex problem can be dismantled into a sequence of simpler and smaller problems. The action the agent takes in a particular stage depends on the current state and the reward the agent would receive by taking that specific action. In addition, the action the agent takes impacts the state of the system, causing the system to transition to a new state.
Useful Questions to Ask?
In the shortest driving time example, the ultimate goal is to minimize total driving time such that I can arrive at Boston as fast as possible. At any given time in my trip, I might be in a different state—the city I am in at that time. For instance, on the second day of the trip, I might be in city C or city D. Given my state, I have a simpler problem to solve: What is the shortest travel time from city C or city D to Boston?
The system may start in different states. The agent takes a series of actions to optimize an objective across multiple stages. Each stage also has multiple states. The specific action an agent can take is a function of the state the agent currently in.
In general, whenever we see problems where DP is applicable, we should ask:
What is the objective?
What are the stages and states of the system?
What are the actions the agent can take at any given state?
How does a specific action change the state of the system?
What is the value function? How is an agent rewarded for taking a particular action at the current stage?
The Challenge
Come up with 3 examples of DP which do not resemble any you’ve seen before. They don’t need to be good, they don’t need to be useful, they just need to be novel (to you).
Any answer must include at least 3 to count, and they must be novel to you. That’s the challenge. We’re here to challenge ourselves, not just review examples we already know.
However, they don’t have to be very good answers or even correct answers. Posting wrong things on the internet is scary, but a very fast way to learn, and I will enforce a high bar for kindness in response-comments. I will personally default to upvoting every complete answer, even if parts of it are wrong, and I encourage others to do the same.
Post your answers inside of spoiler tags. (How do I do that?)
Celebrate others’ answers. This is really important, especially for tougher questions. Sharing exercises in public is a scary experience. I don’t want people to leave this having back-chained the experience “If I go outside my comfort zone, people will look down on me”. So be generous with those upvotes. I certainly will be.
If you comment on someone else’s answers, focus on making exciting, novel ideas work — instead of tearing apart worse ideas. Yes, And is encouraged.
I will remove comments which I deem insufficiently kind, even if I believe they are valuable comments. I want people to feel encouraged to try and fail here, and that means enforcing nicer norms than usual.
If you get stuck, look for:
Problems with multistage nature.
Problems that can be dismantled into a sequence of simpler problems.
Bonus Exercise: for each of your three examples from the challenge, explain:
What are the simpler and smaller problems in the DP example?
What are the states and how taking a specific action alters the state?
what is the reward for taking a specific action on a given state?
This bonus exercise is great blog-post fodder!
Motivation
Using a framing tool is sort of like using a trigger-action pattern: the hard part is to notice a pattern, a place where a particular tool can apply (the “trigger”). Once we notice the pattern, it suggests certain questions or approximations (the “action”). This challenge is meant to train the trigger-step: we look for novel examples to ingrain the abstract trigger pattern (separate from examples/contexts we already know).
The Bonus Exercise is meant to train the action-step: apply whatever questions/approximations the frame suggests, in order to build the reflex of applying them when we notice DP.
Hopefully, this will make it easier to notice when an DP frame can be applied to a new problem you don’t understand in the wild, and to actually use it.
Completing a degree. How do I get a degree in X? Get an A in the classes required for X. What do I do to get an A in those classes? Get an A on the assignments for each class. What do I do to get an A on those assignments? Solve each problem on the assignment. What do I do to solve the problems? Perform each calculation correctly. How do I perform each calculation? Understand the underlying material correctly. How do I understand the underlying material correctly? Understand the individual statements that build up into the explanation correctly...
Building a fence. How do I design a fence? By designing the components: the panels, posts, and gate. How do I design the components? By designing the subcomponents: the frame, paneling, and joints. How do I design the subcomponents? By designing the sub-subcomponents: the cuts, nails/screws, dimensions, and materials.
Making a friend. How do I make a friend? By getting to know an acquaintance better. How do I make an acquaintance? By getting to know a stranger better. How do I meet a stranger? By getting to know my social surroundings better.
I think these examples show some missing requirements in the set of problems suitable for dynamic programming. Primarily, that the sub-problems must be very similar to the main problem, such that the same solution applies to them.
DP is nothing more than recursion + memoization. Recursion is solving a problem by breaking it into a trivial combination of smaller problems, and repeating that process until the smallest problem is itself trivial so you don’t need to break it down further. Memoization is just remembering the results of previous sub-problems, because you’ll often need them multiple times.
The OP misses out on the first step: defining the problem, and skips to the second, adding up the results. DP is a way to notice that you’ll have to solve it in reverse, but you still approach it forward. You write the code that knows the best way from a node to Boston is the lowest sum of “cost to next node” and “cost of THAT node to boston”. So Miami’s best choice is the lowest of 200+A->Boston or 170+B->Boston. We don’t know either of those sub-values yet, but we know how to calculate them—the exact same way, the lower of cost-of-path+cost-of-node-to-Boston. Until we’re one away from Boston, then it’s just cost-of-path-to-Boston.
In setting up this problem this way, it becomes clear that we’re performing calculations backwards from Boston, but we don’t need to figure that out specifically, it’s built into the nature of DP.
Typo: you mention the word “incentive” three times here, rather than “dynamic programming.”
Corrected. Thank you for pointing them out.
Planning: Equivalent to route-planning on an abstract graph, you can let edges represent actions (and their cost), nodes represent states, we want to find a path between (start node, goal node).
Proving a mathematical theorem: You can recursively break the problem down into proving lemmas (and lemmas for those lemmas...) which together prove the theorem
Learning a skill: You break a skill into smaller skills then integrate them back into a whole. Examples in programming: Layers of abstraction in a neural net, reward shaping.
Basically any problem with a graph, network, etc. ;)