I give you an hour and tell you to maximize the probability of [something we intend to use as a reward signal]. In paranoid scenarios, you break out of the box and kill all humans to get your reward signal. But now we have penalized that sort of failure of cooperation. This is just a formalization of “stay in the box,” and I’ve only engaged in this protracted debate to argue that ‘butterfly effects’ from e.g. electron shuffling, the usual objection to such a proposal, don’t seem to be an issue.
In reality, I agree that ‘friendly’ AI is mostly equivalent to building an AI that follows arbitrary goals. So proposals for U which merely might be non-disastrous under ideal social circumstances don’t seem like they address the real concerns about AI risk.
Stuart’s goal is to define a notion of “minimized impact” which does allow an AI to perform tasks. I am more skeptical that this is possible.
I give you an hour and tell you to maximize the probability of [something we intend to use as a reward signal]. In paranoid scenarios, you break out of the box and kill all humans to get your reward signal. But now we have penalized that sort of failure of cooperation. This is just a formalization of “stay in the box,” and I’ve only engaged in this protracted debate to argue that ‘butterfly effects’ from e.g. electron shuffling, the usual objection to such a proposal, don’t seem to be an issue.
In reality, I agree that ‘friendly’ AI is mostly equivalent to building an AI that follows arbitrary goals. So proposals for U which merely might be non-disastrous under ideal social circumstances don’t seem like they address the real concerns about AI risk.
Stuart’s goal is to define a notion of “minimized impact” which does allow an AI to perform tasks. I am more skeptical that this is possible.