If so, that doesn’t seem to be reflected in their papers: none of e.g. Chalmers 2010, Yampolskiy 2012, Armstrong, Sandberg & Bostrom 2012 or Babcock, Kramar & Yampolskiy 2016 mention that as far as I could find, instead only discussing the feasibility of containment. This leaves the impression that successful containment would be sufficient for a safe outcome. E.g. the conclusion section of the Armstrong et al., despite being generally pessimistic and summarizing lots of problems they identified, still seems to suggest that if only the technical problems on Oracle AI could be overcome, then we might be safe:
Analysing the different putative solutions to the OAI-control problem has been a generally discouraging exercise. The physical methods of control, which should be implemented in all cases, are not enough to ensure safe OAI. The other methods of control have been variously insufficient, problematic, or even dangerous.
But these methods are still in their infancy. Control methods used in the real world have been the subject of extensive theoretical analysis or long practical refinement. The lack of intensive study in AI safety leaves methods in this field very underdeveloped. But this is an opportunity: much progress can be expected at relatively little effort. For instance, there is no reason that a few good ideas would not be enough to put the concepts of space and time restrictions on a sufficiently firm basis for rigorous coding.
But the conclusion is not simply that more study is needed. This paper has made some progress in analysing the contours of the problem, and identifying those areas most amenable to useful study, what is important and what is dispensable, and some of the dangers and pitfalls to avoid. The danger of naively relying on confining the OAI to a virtual sub-world should be clear, while sensible boxing methods should be universally applicable. Motivational control appears potentially promising, but it requires more understanding of AI motivation systems before it can be used.
Even the negative results are of use, insofar as they inoculate us against false confidence: the problem of AI control is genuinely hard, and it is important to recognise this. A list of approaches to avoid is valuable as it can help narrow the search.
On the other hand, there are reasons to believe the oracle AI approach is safer than the general AI approach. The accuracy and containment problems are strictly simpler than the general AI safety problem, and many more tools are available to us: physical and epistemic capability control mainly rely on having the AI boxed, while many motivational control methods are enhanced by this fact. Hence there are grounds to direct high-intelligence AI research to explore the oracle AI model.
The creation of super-human artificial intelligence may turn out to be potentially survivable.
Also, in just about every informal discussion about AI safety that I recall seeing, when someone unfamiliar with existing work in the field suggests something like AI boxing, the standard response has always been “you can’t box an AI that’s smarter than you” (sometimes citing Eliezer’s AI box experiments) - which then frequently leads to digressions about whether intelligence is magic, on how trustworthy the evidence from the AI box experiments is, etc.
To be clear, I am making the claim that, of the people who have made useful advances on Oracle AI safety research (Armstrong counts here; I don’t think Yampolskiy does), all of them believe that the goal of having a safe Oracle AI is to achieve a decisive strategic advantage quickly and get to an aligned future. I recognize that this is a hard claim to evaluate (e.g. because this isn’t a statement one could put in a Serious Academic Journal Article in the 2010s, it would have to be discussed on their blog or in private correspondence), but if anyone has a clear counterexample, I’d be interested in seeing it.
My only evidence for this being a neglected consideration was what I wrote above: that the only place where I recall having seen this discussed in any detail is in my ownpapers. (I do believe that Eliezer has briefly mentioned something similar too, but even he has mostly just used the “well you can’t contain a superintelligence” line in response to Oracle AI arguments in general.)
You’re certainly in a position to know the actual thoughts of researchers working on this better than I do, and the thing about confinement being insufficient on its own is rather obvious if you think about it at all. So if you say that “everyone worth mentioning already thinks this”, then that sounds plausible to me and I don’t see a point in trying to go look for counterexamples. But in that case I feel even more frustrated that the “obvious” thing hasn’t really filtered into public discussion, and that e.g. popular takes on the subject still seem to treat the “can’t box a superintelligence” thing as the main argument against OAI, when you could instead give arguments that were much more compelling.
That’s a legit thing to be frustrated by, but I think you know the reason why AI safety researchers don’t want “we don’t see a way to get to a good outcome except for an aligned project to grab a decisive strategic advantage” to filter into public discourse: it pattern-matches too well to “trust us, you need to let us run the universe”.
If so, that doesn’t seem to be reflected in their papers: none of e.g. Chalmers 2010, Yampolskiy 2012, Armstrong, Sandberg & Bostrom 2012 or Babcock, Kramar & Yampolskiy 2016 mention that as far as I could find, instead only discussing the feasibility of containment. This leaves the impression that successful containment would be sufficient for a safe outcome. E.g. the conclusion section of the Armstrong et al., despite being generally pessimistic and summarizing lots of problems they identified, still seems to suggest that if only the technical problems on Oracle AI could be overcome, then we might be safe:
Also, in just about every informal discussion about AI safety that I recall seeing, when someone unfamiliar with existing work in the field suggests something like AI boxing, the standard response has always been “you can’t box an AI that’s smarter than you” (sometimes citing Eliezer’s AI box experiments) - which then frequently leads to digressions about whether intelligence is magic, on how trustworthy the evidence from the AI box experiments is, etc.
To be clear, I am making the claim that, of the people who have made useful advances on Oracle AI safety research (Armstrong counts here; I don’t think Yampolskiy does), all of them believe that the goal of having a safe Oracle AI is to achieve a decisive strategic advantage quickly and get to an aligned future. I recognize that this is a hard claim to evaluate (e.g. because this isn’t a statement one could put in a Serious Academic Journal Article in the 2010s, it would have to be discussed on their blog or in private correspondence), but if anyone has a clear counterexample, I’d be interested in seeing it.
My only evidence for this being a neglected consideration was what I wrote above: that the only place where I recall having seen this discussed in any detail is in my own papers. (I do believe that Eliezer has briefly mentioned something similar too, but even he has mostly just used the “well you can’t contain a superintelligence” line in response to Oracle AI arguments in general.)
You’re certainly in a position to know the actual thoughts of researchers working on this better than I do, and the thing about confinement being insufficient on its own is rather obvious if you think about it at all. So if you say that “everyone worth mentioning already thinks this”, then that sounds plausible to me and I don’t see a point in trying to go look for counterexamples. But in that case I feel even more frustrated that the “obvious” thing hasn’t really filtered into public discussion, and that e.g. popular takes on the subject still seem to treat the “can’t box a superintelligence” thing as the main argument against OAI, when you could instead give arguments that were much more compelling.
That’s a legit thing to be frustrated by, but I think you know the reason why AI safety researchers don’t want “we don’t see a way to get to a good outcome except for an aligned project to grab a decisive strategic advantage” to filter into public discourse: it pattern-matches too well to “trust us, you need to let us run the universe”.