“I find this concept most useful when thinking about the problem of inner optimizers, where in the course of optimization through a rich space you stumble across a member of the space that is itself doing optimization, but for a related but still misspecified metric.”—Could you clarify what kind of algorithm you are imagining being run?
I could imagine this happening with standard deep RL over a long enough time horizon with enough compute. Again though, I want to defer to the upcoming sequence on the topic, which should have a good in-depth explanation.
“I find this concept most useful when thinking about the problem of inner optimizers, where in the course of optimization through a rich space you stumble across a member of the space that is itself doing optimization, but for a related but still misspecified metric.”—Could you clarify what kind of algorithm you are imagining being run?
I could imagine this happening with standard deep RL over a long enough time horizon with enough compute. Again though, I want to defer to the upcoming sequence on the topic, which should have a good in-depth explanation.