I’ll admit I skimmed over it, not really engaging because I don’t hold that part of the shutdown problem as a top concern. My top concerns on this topic are in whether anyone has that much control of an emerging AI’s preferences, and whether humans have the wisdom to actually attempt shutdown.
It does look promising for the portion which it addresses.
Thanks, that’s useful to know. If you have the time, can you say some more about ‘control of an emerging AI’s preferences’? I sketch out a proposed training regimen for the preferences that we want, and argue that this regimen largely circumvents the problems of reward misspecification, goal misgeneralization, and deceptive alignment. Are you not convinced by that part? Or is there some other problem I’m missing?
I’ll admit I skimmed over it, not really engaging because I don’t hold that part of the shutdown problem as a top concern. My top concerns on this topic are in whether anyone has that much control of an emerging AI’s preferences, and whether humans have the wisdom to actually attempt shutdown.
It does look promising for the portion which it addresses.
Thanks, that’s useful to know. If you have the time, can you say some more about ‘control of an emerging AI’s preferences’? I sketch out a proposed training regimen for the preferences that we want, and argue that this regimen largely circumvents the problems of reward misspecification, goal misgeneralization, and deceptive alignment. Are you not convinced by that part? Or is there some other problem I’m missing?