Logan Riggs comments on Attainable Utility Preservation: Scaling to Superhuman

Logan Riggs 27 Feb 2020 18:26 UTC
1 point
Thanks for the link (and the excellent write-up of the problem)!
Regarding the setting, how would the agent gain the ability to create a sub-agent, roll a rock, or limit it’s own abilities initially? Throughout AUP, you normally start with a high penalty for acquiring power, and then you scale it down to reach reasonable, non-catastrophic plans, but your post begins with having higher power.
I don’t think AUP prevents abuse of power you have currently have (?), but prevents gaining that power in the first place.
- Stuart_Armstrong 28 Feb 2020 11:15 UTC
  3 points
  Parent
  The AUP is supposed to prevent the agent accumulating power. The AI initially has huge potential power (because its potential power is all the power it could ever accumulate, given its best strategy to accumulate power) and the penalty is supposed to prevent it turning that potential into actual power—as measured by AUP.
  
  So the AI always has the power to build a subagent; that post just shows that it can do this without triggering the AUP-power penalty.