Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo 1 May 2024 22:21 UTC
6 points
0
Such that you can technically do anything you want—you have maximal power/empowerment—but the super-majority of buttons and button combinations you are likely to push result in increasing the number of paperclips.
I think any model of a rational agent needs to incorporate the fact that they’re not arbitrarily intelligent, otherwise none of their actions make sense. So I’m not too worried about this.
If you make an empowerment calculus that works for humans who are atomic & ideal agents, it probably breaks once you get a superintelligence who can likely mind-hack you into yourself valuing only power.
Yeah, I agree that a lot of concepts get fragile in the context of superintelligence. But while I think of corrigibility as an actively anti-natural concept, empowerment seems like it could perhaps remain robust and well-founded for longer.