Yes. Subagent problems are not cleanly separated from other problems (see section 3.4 of https://www.lesswrong.com/posts/mdQEraEZQLg7jtozn/subagents-and-impact-measures-full-and-fully-illustrated , where the subagent is replaced with a rock). The impact penalty encourages the agent to put restrictions on their own future possible actions. Doing this through a subagent is one way, but there are many others (see Odysseus and the sirens, or section 6.2 of the post above in this comment).
Thanks for the link (and the excellent write-up of the problem)!
Regarding the setting, how would the agent gain the ability to create a sub-agent, roll a rock, or limit it’s own abilities initially? Throughout AUP, you normally start with a high penalty for acquiring power, and then you scale it down to reach reasonable, non-catastrophic plans, but your post begins with having higher power.
I don’t think AUP prevents abuse of power you have currently have (?), but prevents gaining that power in the first place.
The AUP is supposed to prevent the agent accumulating power. The AI initially has huge potential power (because its potential power is all the power it could ever accumulate, given its best strategy to accumulate power) and the penalty is supposed to prevent it turning that potential into actual power—as measured by AUP.
So the AI always has the power to build a subagent; that post just shows that it can do this without triggering the AUP-power penalty.
Yes. Subagent problems are not cleanly separated from other problems (see section 3.4 of https://www.lesswrong.com/posts/mdQEraEZQLg7jtozn/subagents-and-impact-measures-full-and-fully-illustrated , where the subagent is replaced with a rock). The impact penalty encourages the agent to put restrictions on their own future possible actions. Doing this through a subagent is one way, but there are many others (see Odysseus and the sirens, or section 6.2 of the post above in this comment).
Thanks for the link (and the excellent write-up of the problem)!
Regarding the setting, how would the agent gain the ability to create a sub-agent, roll a rock, or limit it’s own abilities initially? Throughout AUP, you normally start with a high penalty for acquiring power, and then you scale it down to reach reasonable, non-catastrophic plans, but your post begins with having higher power.
I don’t think AUP prevents abuse of power you have currently have (?), but prevents gaining that power in the first place.
The AUP is supposed to prevent the agent accumulating power. The AI initially has huge potential power (because its potential power is all the power it could ever accumulate, given its best strategy to accumulate power) and the penalty is supposed to prevent it turning that potential into actual power—as measured by AUP.
So the AI always has the power to build a subagent; that post just shows that it can do this without triggering the AUP-power penalty.