jimrandomh answers Do mesa-optimizer risk arguments rely on the train-test paradigm?

jimrandomh 10 Sep 2020 18:28 UTC
0 points
One would certainly hope that lifelong learning would cause an AI with a proto-mesa-optimizer in it to update by down-weighting the mesa-optimizer. But the opposite could also happen; a proto-mesa-optimizer could use the influence it has over a larger AI system to navigate into situations which will increase the mesa-optimizer’s weight, giving it more control over the system.