Does Armstrong’s/your proposal reduce to “Give the AI a utility function that cares about nothing beyond the next hour, restrict its output to N bits, and blow up the rest of the computer afterward”? If not, can you give me an example of a scenario where the above fails but the more complex proposal succeeds? So far as I can tell, none of the purported “safetiness” in the example you just gave has anything to do with an impact measure.
I give you an hour and tell you to maximize the probability of [something we intend to use as a reward signal]. In paranoid scenarios, you break out of the box and kill all humans to get your reward signal. But now we have penalized that sort of failure of cooperation. This is just a formalization of “stay in the box,” and I’ve only engaged in this protracted debate to argue that ‘butterfly effects’ from e.g. electron shuffling, the usual objection to such a proposal, don’t seem to be an issue.
In reality, I agree that ‘friendly’ AI is mostly equivalent to building an AI that follows arbitrary goals. So proposals for U which merely might be non-disastrous under ideal social circumstances don’t seem like they address the real concerns about AI risk.
Stuart’s goal is to define a notion of “minimized impact” which does allow an AI to perform tasks. I am more skeptical that this is possible.
As the current ultimate authority on AI safety I am curious if you would consider the safety profile of this oracle as interpreted here to be along the lines I describe there. That is, if it could actually be constructed as defined it would be more or less safe with respect to its own operation except for those pesky N bits and what external entities can do with them.
Unless I have missed something the problem with attempting to implement such an AI as a practical strategy are:
The research required to create the oracle is almost all of what it takes to create an FAI. It requires all of the research that goes into FAI for CEV research—and if the oracle is able to answer questions that are simple math proofs then even a significant part of what constitutes a CEV implementation would be required.
Does Armstrong’s/your proposal reduce to “Give the AI a utility function that cares about nothing beyond the next hour, restrict its output to N bits, and blow up the rest of the computer afterward”?
The other important part that was mentioned (or, at least, w was that it is not allowed to (cares negatively about) influencing the world outside of a spacial boundary within that hour except via those N bits or via some threshold of incidental EM radiation and the energy consumption it is allocated. The most obvious things this would seem to prevent it from doing would be hacking a few super computers and a botnet to get some extra processing done in the hour or, for that matter, getting any input at all from external information sources. It is also unable to recursively self improve (much) so that leaves us in the dark about how it managed to become an oracle in the first place.
Does Armstrong’s/your proposal reduce to “Give the AI a utility function that cares about nothing beyond the next hour, restrict its output to N bits, and blow up the rest of the computer afterward”? If not, can you give me an example of a scenario where the above fails but the more complex proposal succeeds? So far as I can tell, none of the purported “safetiness” in the example you just gave has anything to do with an impact measure.
I give you an hour and tell you to maximize the probability of [something we intend to use as a reward signal]. In paranoid scenarios, you break out of the box and kill all humans to get your reward signal. But now we have penalized that sort of failure of cooperation. This is just a formalization of “stay in the box,” and I’ve only engaged in this protracted debate to argue that ‘butterfly effects’ from e.g. electron shuffling, the usual objection to such a proposal, don’t seem to be an issue.
In reality, I agree that ‘friendly’ AI is mostly equivalent to building an AI that follows arbitrary goals. So proposals for U which merely might be non-disastrous under ideal social circumstances don’t seem like they address the real concerns about AI risk.
Stuart’s goal is to define a notion of “minimized impact” which does allow an AI to perform tasks. I am more skeptical that this is possible.
As the current ultimate authority on AI safety I am curious if you would consider the safety profile of this oracle as interpreted here to be along the lines I describe there. That is, if it could actually be constructed as defined it would be more or less safe with respect to its own operation except for those pesky N bits and what external entities can do with them.
Unless I have missed something the problem with attempting to implement such an AI as a practical strategy are:
It is an infinity plus one sword—you can’t just leave those lying around.
The research required to create the oracle is almost all of what it takes to create an FAI. It requires all of the research that goes into FAI for CEV research—and if the oracle is able to answer questions that are simple math proofs then even a significant part of what constitutes a CEV implementation would be required.
The other important part that was mentioned (or, at least, w was that it is not allowed to (cares negatively about) influencing the world outside of a spacial boundary within that hour except via those N bits or via some threshold of incidental EM radiation and the energy consumption it is allocated. The most obvious things this would seem to prevent it from doing would be hacking a few super computers and a botnet to get some extra processing done in the hour or, for that matter, getting any input at all from external information sources. It is also unable to recursively self improve (much) so that leaves us in the dark about how it managed to become an oracle in the first place.