Benjamin Spiegel comments on Relaxation-Based Search, From Everyday Life To Unfamiliar Territory

Benjamin Spiegel 10 Nov 2021 23:25 UTC
5 points
This concept is often discussed in the subfield of AI called planning. There are a few notes you hit on that were of particular interest to me / relevance to the field:
The key is that we can usually express the problem-space using constraints which each depend on only a few dimensions.
In Reinforcement Learning and Planning, domains which obey this property are often modeled as Factored Markov Decision Processes (MDPs), where there are known dependency relationships between different portions of the state space that can be represented compactly using a Dynamic Bayes Net (DBN). The dynamics of Factored MDPs are easier to learn from an RL perspective, and knowing that an MDP’s state space is factored has other desirable properties from a planning perspective.
I expect getting to the airport to be easy. There are many ways to get there (train, Uber/Lyft, drive & park) all of which I’ve used before and any of which would be fine.
...
I want to arrive at the airport an hour before the plane takes off, that constraint only involves two dimensions: my arrival time at the airport, and the takeoff time of the flight. It does not directly depend on what time I wake up, whether I pack a toothbrush, my parents’ plans, cost of the plane tickets, etc, etc.
You are actually touching on what seems to be three kinds of independence relationships. The first is temporal, and has something to do with options having identical goal states. The second is regarding the underlying independence relationships of the MDP. The third isn’t technically an independence relationship, and is instead in regards to the utility of abstraction. In detail:
1. It doesn’t matter which option you take (train, Uber/Lyft, drive & park) because they all have the same termination state (at the airport). This shows that we plan primarily using subgoals.
2. Certain factors of the state space (your parents’ plans, whether you pack a toothbrush, cost of the place tickets) are actually independent of each other, i.e. your parents’ plans have no real physical consequences in your plan at any time, e.g. you can walk and chew gum at the same time. This shows that we plan with a factored understanding of the state-action space.
3. The time you wake up does indeed matter in your plan, but the exact time does not. For your planning purposes, waking up any time before you must leave your house (including factoring in packing, etc.) is permissible and functionally equivalent in your plan. All possible states of being awake before your out-the-door time collapse to the same abstract state of being awake on-time. This shows that we plan using abstract states (a similar, but subtly different point than point 1).
More generally, how can we efficiently figure out which constraints are taut vs slack in a new domain? How do we map out the problem/solution space?
We can use the three kinds of independence relationships I mentioned above to answer these questions in the RL/Planning setting:
1. So long as you can learn to consistently reach a specific state, you can use that state as a subgoal for planning and exploration. This principle is used in some existing RL literature (I’m a student in this lab).
2. If you can figure out the underlying representation of the world and discern independence relationships between state variables, you can focus on making plans for subsets of the state space. This idea is used in some planning literature.
3. If you discover a consistent way to get from any set of states A to a single state b, you can treat all states in A as a single abstract state a, so long as b is relevant to the rest of your plan. This abstraction principle allows one to derive a smaller, discrete MDP (much easier to solve) from a bigger, continuous one. This is actually the theme of the literature in point 1, and here is the source text (to be more specific, I am an undergrad working in George’s lab).