I think it’s generally a good property as a reasonable person would execute it. The problem, however, is the bad ex ante clinginess plans, where the agent has an incentive to pre-emptively constrain our reactions as hard as it can (and this could be really hard).
The problem is lessened if the agent is agnostic to the specific details of the world, but like I said, it seems like we really need IV (or an improved successor to it) to cleanly cut off these perverse incentives.
I’m not sure I understand the connection to scapegoating for the agents we’re talking about; scapegoating is only permitted if credit assignment is explicitly part of the approach and there are privileged “agents” in the provided ontology.
I think it’s generally a good property as a reasonable person would execute it. The problem, however, is the bad ex ante clinginess plans, where the agent has an incentive to pre-emptively constrain our reactions as hard as it can (and this could be really hard).
The problem is lessened if the agent is agnostic to the specific details of the world, but like I said, it seems like we really need IV (or an improved successor to it) to cleanly cut off these perverse incentives.
I’m not sure I understand the connection to scapegoating for the agents we’re talking about; scapegoating is only permitted if credit assignment is explicitly part of the approach and there are privileged “agents” in the provided ontology.