Thanks for the link (and the excellent write-up of the problem)!
Regarding the setting, how would the agent gain the ability to create a sub-agent, roll a rock, or limit it’s own abilities initially? Throughout AUP, you normally start with a high penalty for acquiring power, and then you scale it down to reach reasonable, non-catastrophic plans, but your post begins with having higher power.
I don’t think AUP prevents abuse of power you have currently have (?), but prevents gaining that power in the first place.
The AUP is supposed to prevent the agent accumulating power. The AI initially has huge potential power (because its potential power is all the power it could ever accumulate, given its best strategy to accumulate power) and the penalty is supposed to prevent it turning that potential into actual power—as measured by AUP.
So the AI always has the power to build a subagent; that post just shows that it can do this without triggering the AUP-power penalty.
Thanks for the link (and the excellent write-up of the problem)!
Regarding the setting, how would the agent gain the ability to create a sub-agent, roll a rock, or limit it’s own abilities initially? Throughout AUP, you normally start with a high penalty for acquiring power, and then you scale it down to reach reasonable, non-catastrophic plans, but your post begins with having higher power.
I don’t think AUP prevents abuse of power you have currently have (?), but prevents gaining that power in the first place.
The AUP is supposed to prevent the agent accumulating power. The AI initially has huge potential power (because its potential power is all the power it could ever accumulate, given its best strategy to accumulate power) and the penalty is supposed to prevent it turning that potential into actual power—as measured by AUP.
So the AI always has the power to build a subagent; that post just shows that it can do this without triggering the AUP-power penalty.