Ben Amitay comments on Training for corrigability: obvious problems?

Ben Amitay Apr 22, 2023, 7:16 PM
1 point
0
Hi, sorry for commenting on ancient comment, but I just read it again and found that I’m not convinced that the mesaoptimizers problem is relevant here. My understanding is that if you switch goals often enough, every mesaiptimizer that isn’t corrigible should be trained away as it hurt the utility as defined.