but their position relies on physical facts about the world, along with a narrow definition of the correction of value event. To combat that, we’d need to define the operator properly
I feel like this is a very generic problem with safety.
If you can’t specify to an algorithm what part of the real world you’re talking about (at least to a reasonably good approximation) then it is very hard to make progress.
Perhaps the simplest way to specify “who the operator is” is to define the abstract notion of creation-of-subagents, and tell the AI that the “operator” is the agent that created it.
The abstract notion of creation-of-subagents seems quite robust once you have a system that can represent and reason about that notion; it certainly seems like if you created an agent which, from the get-go understood and cared about “the agent that created it”, you would rule out entire classes of paperclipper-like AIs.
I feel like this is a very generic problem with safety.
If you can’t specify to an algorithm what part of the real world you’re talking about (at least to a reasonably good approximation) then it is very hard to make progress.
Perhaps the simplest way to specify “who the operator is” is to define the abstract notion of creation-of-subagents, and tell the AI that the “operator” is the agent that created it.
The abstract notion of creation-of-subagents seems quite robust once you have a system that can represent and reason about that notion; it certainly seems like if you created an agent which, from the get-go understood and cared about “the agent that created it”, you would rule out entire classes of paperclipper-like AIs.
“a physical event” seems much easier to define than “an operator” or “a subagent”.