As written w takes behaviors to “properties about world-trajectories that the base optimizer might care about” as Wei Dai says here. If there is uncertainty, I think w could return distributions over such world-trajectories, and the argument would still work.
As written w takes behaviors to “properties about world-trajectories that the base optimizer might care about” as Wei Dai says here. If there is uncertainty, I think w could return distributions over such world-trajectories, and the argument would still work.
Ah I see, and just to make sure I’m not going crazy, you’ve edited the post now to reflect this?
Yes