Charlie Steiner comments on Goal-directed = Model-based RL?

Charlie Steiner 20 Feb 2020 20:22 UTC
4 points
Yup, I’m pretty sure people are aware of this :) See also the model of an agent as something with preferences, beliefs, available actions, and a search+decision algorithm that makes it take actions it believes will help its preferences.
But future AI research will require some serious generalizations that are left un-generalized in current methods. A simple gridworld problem might treat the entire grid as a known POMDP and do search over possible series of actions. Obviously the real world isn’t a known POMDP, so suppose that we just call it an unknown POMDP and try to learn it through observation—now all of a sudden, you can’t hand-specify a cost function in terms of the world model anymore, so that needs to be re-evaluated as well.
- adamShimi 20 Feb 2020 20:44 UTC
  2 points
  Parent
  I’m curious about what you think people are aware of: that the idea of goal-directedness from the value learning sequence is captured by model-based RL, or that any sufficiently powerful agent (implicitly goal-directed) needs to be model-based instead of model-free?
  If that’s the former, I’m really interested in links to posts and comments pointing that out, as I don’t know of any. And if that’s the latter, then it seems that it is goes back to asking whether powerful agents must be goal-directed.
  - Charlie Steiner 20 Feb 2020 22:19 UTC
    3 points
    Parent
    The former (that is, model-based RL-> agent). The latter (smart agent → model-based RL), I think, would be founded on a bit of a level error. At bottom, there are only atoms and the void. Whether something is “really” an agent is a question of how well we can describe this collection of atoms in terms of an agent-shaped model. This is different from the question of what abstractions humans used in the process of programming the AI; Like Rohin says, parts of the agent might be thought of as implicit in the programming, rather than explicit.
    
    Sorry, I don’t know if I can direct you to any explicit sources. If you check out papers like Concrete Problems in AI Safety or others in that genre, though, you’ll see model-based RL used as a simplifying set of assumptions that imply agency.