Garrett Baker comments on Richard Ngo’s Shortform

Garrett Baker 1 May 2024 20:29 UTC
6 points
0
There’s also the problem of: what do you mean by “the human”? If you make an empowerment calculus that works for humans who are atomic & ideal agents, it probably breaks once you get a superintelligence who can likely mind-hack you into yourself valuing only power. It never forces you to abstain from giving up power, since if you’re perfectly capable of making different decisions, but you just don’t.

Another problem, which I like to think of as the “control panel of the universe” problem, is where the AI gives you the “control panel of the universe”, but you aren’t smart enough to operate it, in the sense that you have the information necessary to operate it, but not the intelligence. Such that you can technically do anything you want—you have maximal power/empowerment—but the super-majority of buttons and button combinations you are likely to push result in increasing the number of paperclips.
- Richard_Ngo 1 May 2024 22:21 UTC
  6 points
  0
  Parent
  Such that you can technically do anything you want—you have maximal power/empowerment—but the super-majority of buttons and button combinations you are likely to push result in increasing the number of paperclips.
  I think any model of a rational agent needs to incorporate the fact that they’re not arbitrarily intelligent, otherwise none of their actions make sense. So I’m not too worried about this.
  If you make an empowerment calculus that works for humans who are atomic & ideal agents, it probably breaks once you get a superintelligence who can likely mind-hack you into yourself valuing only power.
  Yeah, I agree that a lot of concepts get fragile in the context of superintelligence. But while I think of corrigibility as an actively anti-natural concept, empowerment seems like it could perhaps remain robust and well-founded for longer.