It might well be that 1) people who already know RL shouldn’t be much surprised by this result and 2) people who don’t know much RL are justified in updating on this info (towards mesa-optimizers arising more easily).
This would be the case if RL intuition correctly implies that proto-mesa-optimizers (like the one in the paper) arise naturally, and that intuition wasn’t widely shared outside of RL. Not sure if this is actually the way things are, but it seems plausible to me.
It might well be that 1) people who already know RL shouldn’t be much surprised by this result and 2) people who don’t know much RL are justified in updating on this info (towards mesa-optimizers arising more easily).
I agree. It seems pretty bad if the participants of a forum about AI alignment don’t know RL.
It might well be that 1) people who already know RL shouldn’t be much surprised by this result and 2) people who don’t know much RL are justified in updating on this info (towards mesa-optimizers arising more easily).
This would be the case if RL intuition correctly implies that proto-mesa-optimizers (like the one in the paper) arise naturally, and that intuition wasn’t widely shared outside of RL. Not sure if this is actually the way things are, but it seems plausible to me.
I agree. It seems pretty bad if the participants of a forum about AI alignment don’t know RL.