An oracle doesn’t have to have hidden goals. But when you ask it what actions would be needed to do the long term task, it chooses the actions that lead to that would lead to that task being completed. If you phrase that carefully enough maybe you can get away with it. But maybe it calculates the best output to achieve result X is an output that tricks you into rewriting itself into an agent. etc.
In general, asking an oracle AI any question whose answers depend on the future effects in the real world of those answers would be very dangerous.
On the other hand, I don’t think answering important questions on solving AI alignment is a task whose output necessarily needs to depend on its future effects on the real world. So, in my view an oracle could be used to solve AI alignment, without killing everyone as long as there are appropriate precautions against asking it careless questions.
An oracle doesn’t have to have hidden goals. But when you ask it what actions would be needed to do the long term task, it chooses the actions that lead to that would lead to that task being completed. If you phrase that carefully enough maybe you can get away with it. But maybe it calculates the best output to achieve result X is an output that tricks you into rewriting itself into an agent. etc.
In general, asking an oracle AI any question whose answers depend on the future effects in the real world of those answers would be very dangerous.
On the other hand, I don’t think answering important questions on solving AI alignment is a task whose output necessarily needs to depend on its future effects on the real world. So, in my view an oracle could be used to solve AI alignment, without killing everyone as long as there are appropriate precautions against asking it careless questions.