Rohin Shah comments on Goal-directed = Model-based RL?

Rohin Shah 20 Feb 2020 21:06 UTC
10 points
0
First off, I’m super happy that people are thinking about goal-directed behavior :D
I think model-based RL is typically goal-directed, in that it typically performs a search using a world model for a trajectory that achieves high reward (the goal). However, powerful model-free RL usually is also goal-directed—consider AlphaZero (without the MCTS), OpenAI Five, AlphaStar, etc: these are all model-free, but still seem fairly goal-directed. More generally, model-free and model-based RL algorithms usually get similar performance on environments (often model-free is less sample efficient but has a higher final performance, though this isn’t always the case).
Also more broadly, I think there’s a smooth spectrum between “habitual cognition” and “goal-directed cognition”, such that you can’t cleanly carve up the space into a binary “goal-directed” or not.
- adamShimi 20 Feb 2020 22:08 UTC
  1 point
  Parent
  Thanks for the feedback!
  I indeed am thinking about your intuitions for goal-directed behaviors, because it seems quite important. I currently lack a clear idea (as formal as possible) of what you mean, and thus I have trouble weighting your arguments that it is not necessary, or that it causes most problems in safety. And since these arguments would have significant implications, I want to have as informed as possible an opinion on them.
  Since you say that goal-directed behavior is not about having a model or not, is it about the form of the model? Or about the use of the model? Would a model-based agent that did not adapt its model when the environment changed be considered as not goal-directed (like the lookup-table agent in your example)?
  - Rohin Shah 21 Feb 2020 23:04 UTC
    3 points
    Parent
    Since you say that goal-directed behavior is not about having a model or not, is it about the form of the model? Or about the use of the model?
    I’m thinking that there may not be any model. Consider for example an agent that solves (simply connected) mazes by implementing the right hand rule: such an agent seems at least somewhat goal-directed, but it’s hard for me to see a model anywhere in this agent.
    Would a model-based agent that did not adapt its model when the environment changed be considered as not goal-directed (like the lookup-table agent in your example)?
    Yeah, I think that does make it less goal-directed.
    - adamShimi 22 Feb 2020 14:39 UTC
      3 points
      Parent
      About the “right hand rule” agent, I feel it depends on whether it is a hard-coded agent or a learning agent. If it is hard-coded, then clearly it doesn’t require a model. But if it learns such a rule, I would assume it was inferred from a learned model of what mazes are.
      For the non-adaptative agent, you say it is less goal-directed; do you see goal-directedness as a continuous spectrum, as a set of zones on this spectrum, or as a binary threshold on this spectrum?
      - Rohin Shah 22 Feb 2020 16:02 UTC
        3 points
        Parent
        About the “right hand rule” agent, I feel it depends on whether it is a hard-coded agent or a learning agent.
        Yes, I meant the hard-coded one. It still seems somewhat goal-directed to me.
        do you see goal-directedness as a continuous spectrum, as a set of zones on this spectrum, or as a binary threshold on this spectrum?
        Oh, definitely a continuous spectrum. (Though I think several people disagree with me on this, and see it more like a binary-ish threshold. Such people often say things like “intelligence and generalization require some sort of search-like cognition”. I don’t understand their views very well.)
        adamShimi 10 Mar 2020 17:07 UTC
        1 point
        Parent
        Do you have references of posts of those people who think goal-directedness is binary-ish? That would be very useful, thanks. :)
        Rohin Shah 12 Mar 2020 16:37 UTC
        2 points
        Parent
        Uh, not really. The mesa optimizers sequence sort of comes from this viewpoint, as does this question, but I haven’t really seen any posts arguing for this position.