Thank you very much for linking these two posts, which I hadn’t read before. I’ll start using the direct vs amortized optimization terminology as I think it makes things more clear.
The intuition that reward models and planners have an adversarial relationship seems crucial, and it doesn’t seem as widespread as I’d like.
On a meta-level your appreciation comment will motivate me to write more, despite the ideas themselves being often half-baked in my mind, and the expositions not always clear and eloquent.
Thank you very much for linking these two posts, which I hadn’t read before. I’ll start using the direct vs amortized optimization terminology as I think it makes things more clear.
The intuition that reward models and planners have an adversarial relationship seems crucial, and it doesn’t seem as widespread as I’d like.
On a meta-level your appreciation comment will motivate me to write more, despite the ideas themselves being often half-baked in my mind, and the expositions not always clear and eloquent.