I think this is a really good post. You might be interested in these twoposts which explore very similar arguments on the interactions between search in the world model and more general ‘intuitive policies’ as well as the fact that we are always optimizing for our world/reward model and not reality and how this affects how agents act.
Thank you very much for linking these two posts, which I hadn’t read before. I’ll start using the direct vs amortized optimization terminology as I think it makes things more clear.
The intuition that reward models and planners have an adversarial relationship seems crucial, and it doesn’t seem as widespread as I’d like.
On a meta-level your appreciation comment will motivate me to write more, despite the ideas themselves being often half-baked in my mind, and the expositions not always clear and eloquent.
I think this is a really good post. You might be interested in these two posts which explore very similar arguments on the interactions between search in the world model and more general ‘intuitive policies’ as well as the fact that we are always optimizing for our world/reward model and not reality and how this affects how agents act.
Thank you very much for linking these two posts, which I hadn’t read before. I’ll start using the direct vs amortized optimization terminology as I think it makes things more clear.
The intuition that reward models and planners have an adversarial relationship seems crucial, and it doesn’t seem as widespread as I’d like.
On a meta-level your appreciation comment will motivate me to write more, despite the ideas themselves being often half-baked in my mind, and the expositions not always clear and eloquent.