This is a very interesting definition of impact, and seems very useful. I’m curious about a few things:
Since AUP does not operate based off of culpability, creating a high-impact successor agent is basically just as impactful as being that successor agent.
How would AUP handle a plan that involves:
a) creating agents that work similar to the way it does, or are copies? Copies without an off switch?
b) if it thinks that it will take humans longer to shut it and all its copies down? (i.e. instead of preventing humans from shutting it down, might it try to slow them down?)
c) setting up a chain of events that will turn it back on?
d) shutting itself down, as opposed to taking no action?
(Supposing the agent can somehow model successors)
a) The activation action would be penalized by the new agent’s expected impact. The new agent’s impact budget might essentially be deducted all at once, plus whatever frictional costs are imposed by setup. Plus, making successors is instrumentally convergent, so this seems pretty unlikely.
Do note that it wouldn’t vicariously incur the anti-”survival incentive” incentive penalty. However, this still doesn’t seem to let it sneak in extra impact, if you think about it. Therefore, just making a normal maximizer is highly penalized, for resource, instrumental, and approval reasons.
b) Heavily penalized.
c) Unclear, this is one of the embedded agency questions.
d) Heavily penalized, discussed in the anti-”survival incentive” incentive examples.
This is a very interesting definition of impact, and seems very useful. I’m curious about a few things:
How would AUP handle a plan that involves:
a) creating agents that work similar to the way it does, or are copies? Copies without an off switch?
b) if it thinks that it will take humans longer to shut it and all its copies down? (i.e. instead of preventing humans from shutting it down, might it try to slow them down?)
c) setting up a chain of events that will turn it back on?
d) shutting itself down, as opposed to taking no action?
(Supposing the agent can somehow model successors)
a) The activation action would be penalized by the new agent’s expected impact. The new agent’s impact budget might essentially be deducted all at once, plus whatever frictional costs are imposed by setup. Plus, making successors is instrumentally convergent, so this seems pretty unlikely.
Do note that it wouldn’t vicariously incur the anti-”survival incentive” incentive penalty. However, this still doesn’t seem to let it sneak in extra impact, if you think about it. Therefore, just making a normal maximizer is highly penalized, for resource, instrumental, and approval reasons.
b) Heavily penalized.
c) Unclear, this is one of the embedded agency questions.
d) Heavily penalized, discussed in the anti-”survival incentive” incentive examples.