I think of Utility != Reward as probably the most important core point from the Mesa-Optimizer paper, and I preferred this explanation over the one in the paper (though it leaves out many things and wouldn’t want it to be the only thing someone reads on the topic)
I think of Utility != Reward as probably the most important core point from the Mesa-Optimizer paper, and I preferred this explanation over the one in the paper (though it leaves out many things and wouldn’t want it to be the only thing someone reads on the topic)