Focus on technical solutions to x-risk rather than political or societal
One consideration that points against this is that focusing on technical solutions will make you only think about technical problems, but if you don’t also look at the societal problems, you might not realize that your proposed technical solution is unworkable due to a societal problem.
One good example is Oracle AI. People have debated the question of whether we could use a pure question-answering or “tool” AI as a way to create safe agent AI. There has been a bunch of discussion about the technical challenge of creating it, where the objections have typically focused on something like “you can’t box in a superintelligent AI that wants to escape”, and then sought to define ways to make the AI want to stay in the box.
But this neglects the fact that even if you manage to build an AI that wants to stay in the box, this is useless if there are others who have reasons to let their AI out of the box. (My paper “Disjunctive Scenarios of Catastrophic AI Risk” goes into detail about the various reasons that would cause people to let their AI out, in section 5.2.) Solving the technical problem of keeping the AI contained does nothing for the societal problem of making people want to keep their AIs contained.
Similarly, Seth Baum has pointed out that the challenge of creating beneficial AI is a social challenge because it seeks to motivate AI developers to choose beneficial AI designs. This is the general form of the specific example I gave above: it’s not enough to create an aligned technical design, one also needs to get people to implement your aligned designs.
Of course, you can try to just be the first one to build an aligned superintelligence that takes over the world… but that’s super-risky for obvious reasons, such as the fact that it involves a race to be the first one to build the superintelligence, meaning that you don’t have the time to make the superintelligence safely aligned. To avoid that, you’ll want to try to avoid arms races… which is again a societal problem.
In order to have a good understanding of what would work for solving the AI problem, you need to have an understanding of the whole problem, and the societal dimension represents a big part of the problem. I’m not saying that you couldn’t still focus primarily on the technical aspects—after all, a single person can only do as much and we all need to specialize—but you should keep in mind what kinds of technical solutions look feasible given the societal landscape, and properly understanding the nature of the societal landscape requires spending some effort on also thinking about the societal problems and their possible solutions.
I’m pretty sure that, without exception, anyone who’s made a useful contribution on Oracle AI recognizes that “let several organizations have an Oracle AI for a significant amount of time” is a world-ending failure, and that their work is instead progress on questions like “if you can have the only Oracle AI for six months, can you save the world rather than end it?”
If so, that doesn’t seem to be reflected in their papers: none of e.g. Chalmers 2010, Yampolskiy 2012, Armstrong, Sandberg & Bostrom 2012 or Babcock, Kramar & Yampolskiy 2016 mention that as far as I could find, instead only discussing the feasibility of containment. This leaves the impression that successful containment would be sufficient for a safe outcome. E.g. the conclusion section of the Armstrong et al., despite being generally pessimistic and summarizing lots of problems they identified, still seems to suggest that if only the technical problems on Oracle AI could be overcome, then we might be safe:
Analysing the different putative solutions to the OAI-control problem has been a generally discouraging exercise. The physical methods of control, which should be implemented in all cases, are not enough to ensure safe OAI. The other methods of control have been variously insufficient, problematic, or even dangerous.
But these methods are still in their infancy. Control methods used in the real world have been the subject of extensive theoretical analysis or long practical refinement. The lack of intensive study in AI safety leaves methods in this field very underdeveloped. But this is an opportunity: much progress can be expected at relatively little effort. For instance, there is no reason that a few good ideas would not be enough to put the concepts of space and time restrictions on a sufficiently firm basis for rigorous coding.
But the conclusion is not simply that more study is needed. This paper has made some progress in analysing the contours of the problem, and identifying those areas most amenable to useful study, what is important and what is dispensable, and some of the dangers and pitfalls to avoid. The danger of naively relying on confining the OAI to a virtual sub-world should be clear, while sensible boxing methods should be universally applicable. Motivational control appears potentially promising, but it requires more understanding of AI motivation systems before it can be used.
Even the negative results are of use, insofar as they inoculate us against false confidence: the problem of AI control is genuinely hard, and it is important to recognise this. A list of approaches to avoid is valuable as it can help narrow the search.
On the other hand, there are reasons to believe the oracle AI approach is safer than the general AI approach. The accuracy and containment problems are strictly simpler than the general AI safety problem, and many more tools are available to us: physical and epistemic capability control mainly rely on having the AI boxed, while many motivational control methods are enhanced by this fact. Hence there are grounds to direct high-intelligence AI research to explore the oracle AI model.
The creation of super-human artificial intelligence may turn out to be potentially survivable.
Also, in just about every informal discussion about AI safety that I recall seeing, when someone unfamiliar with existing work in the field suggests something like AI boxing, the standard response has always been “you can’t box an AI that’s smarter than you” (sometimes citing Eliezer’s AI box experiments) - which then frequently leads to digressions about whether intelligence is magic, on how trustworthy the evidence from the AI box experiments is, etc.
To be clear, I am making the claim that, of the people who have made useful advances on Oracle AI safety research (Armstrong counts here; I don’t think Yampolskiy does), all of them believe that the goal of having a safe Oracle AI is to achieve a decisive strategic advantage quickly and get to an aligned future. I recognize that this is a hard claim to evaluate (e.g. because this isn’t a statement one could put in a Serious Academic Journal Article in the 2010s, it would have to be discussed on their blog or in private correspondence), but if anyone has a clear counterexample, I’d be interested in seeing it.
My only evidence for this being a neglected consideration was what I wrote above: that the only place where I recall having seen this discussed in any detail is in my ownpapers. (I do believe that Eliezer has briefly mentioned something similar too, but even he has mostly just used the “well you can’t contain a superintelligence” line in response to Oracle AI arguments in general.)
You’re certainly in a position to know the actual thoughts of researchers working on this better than I do, and the thing about confinement being insufficient on its own is rather obvious if you think about it at all. So if you say that “everyone worth mentioning already thinks this”, then that sounds plausible to me and I don’t see a point in trying to go look for counterexamples. But in that case I feel even more frustrated that the “obvious” thing hasn’t really filtered into public discussion, and that e.g. popular takes on the subject still seem to treat the “can’t box a superintelligence” thing as the main argument against OAI, when you could instead give arguments that were much more compelling.
That’s a legit thing to be frustrated by, but I think you know the reason why AI safety researchers don’t want “we don’t see a way to get to a good outcome except for an aligned project to grab a decisive strategic advantage” to filter into public discourse: it pattern-matches too well to “trust us, you need to let us run the universe”.
One consideration that points against this is that focusing on technical solutions will make you only think about technical problems, but if you don’t also look at the societal problems, you might not realize that your proposed technical solution is unworkable due to a societal problem.
One good example is Oracle AI. People have debated the question of whether we could use a pure question-answering or “tool” AI as a way to create safe agent AI. There has been a bunch of discussion about the technical challenge of creating it, where the objections have typically focused on something like “you can’t box in a superintelligent AI that wants to escape”, and then sought to define ways to make the AI want to stay in the box.
But this neglects the fact that even if you manage to build an AI that wants to stay in the box, this is useless if there are others who have reasons to let their AI out of the box. (My paper “Disjunctive Scenarios of Catastrophic AI Risk” goes into detail about the various reasons that would cause people to let their AI out, in section 5.2.) Solving the technical problem of keeping the AI contained does nothing for the societal problem of making people want to keep their AIs contained.
Similarly, Seth Baum has pointed out that the challenge of creating beneficial AI is a social challenge because it seeks to motivate AI developers to choose beneficial AI designs. This is the general form of the specific example I gave above: it’s not enough to create an aligned technical design, one also needs to get people to implement your aligned designs.
Of course, you can try to just be the first one to build an aligned superintelligence that takes over the world… but that’s super-risky for obvious reasons, such as the fact that it involves a race to be the first one to build the superintelligence, meaning that you don’t have the time to make the superintelligence safely aligned. To avoid that, you’ll want to try to avoid arms races… which is again a societal problem.
In order to have a good understanding of what would work for solving the AI problem, you need to have an understanding of the whole problem, and the societal dimension represents a big part of the problem. I’m not saying that you couldn’t still focus primarily on the technical aspects—after all, a single person can only do as much and we all need to specialize—but you should keep in mind what kinds of technical solutions look feasible given the societal landscape, and properly understanding the nature of the societal landscape requires spending some effort on also thinking about the societal problems and their possible solutions.
I’m pretty sure that, without exception, anyone who’s made a useful contribution on Oracle AI recognizes that “let several organizations have an Oracle AI for a significant amount of time” is a world-ending failure, and that their work is instead progress on questions like “if you can have the only Oracle AI for six months, can you save the world rather than end it?”
Correct me if I’m wrong.
If so, that doesn’t seem to be reflected in their papers: none of e.g. Chalmers 2010, Yampolskiy 2012, Armstrong, Sandberg & Bostrom 2012 or Babcock, Kramar & Yampolskiy 2016 mention that as far as I could find, instead only discussing the feasibility of containment. This leaves the impression that successful containment would be sufficient for a safe outcome. E.g. the conclusion section of the Armstrong et al., despite being generally pessimistic and summarizing lots of problems they identified, still seems to suggest that if only the technical problems on Oracle AI could be overcome, then we might be safe:
Also, in just about every informal discussion about AI safety that I recall seeing, when someone unfamiliar with existing work in the field suggests something like AI boxing, the standard response has always been “you can’t box an AI that’s smarter than you” (sometimes citing Eliezer’s AI box experiments) - which then frequently leads to digressions about whether intelligence is magic, on how trustworthy the evidence from the AI box experiments is, etc.
To be clear, I am making the claim that, of the people who have made useful advances on Oracle AI safety research (Armstrong counts here; I don’t think Yampolskiy does), all of them believe that the goal of having a safe Oracle AI is to achieve a decisive strategic advantage quickly and get to an aligned future. I recognize that this is a hard claim to evaluate (e.g. because this isn’t a statement one could put in a Serious Academic Journal Article in the 2010s, it would have to be discussed on their blog or in private correspondence), but if anyone has a clear counterexample, I’d be interested in seeing it.
My only evidence for this being a neglected consideration was what I wrote above: that the only place where I recall having seen this discussed in any detail is in my own papers. (I do believe that Eliezer has briefly mentioned something similar too, but even he has mostly just used the “well you can’t contain a superintelligence” line in response to Oracle AI arguments in general.)
You’re certainly in a position to know the actual thoughts of researchers working on this better than I do, and the thing about confinement being insufficient on its own is rather obvious if you think about it at all. So if you say that “everyone worth mentioning already thinks this”, then that sounds plausible to me and I don’t see a point in trying to go look for counterexamples. But in that case I feel even more frustrated that the “obvious” thing hasn’t really filtered into public discussion, and that e.g. popular takes on the subject still seem to treat the “can’t box a superintelligence” thing as the main argument against OAI, when you could instead give arguments that were much more compelling.
That’s a legit thing to be frustrated by, but I think you know the reason why AI safety researchers don’t want “we don’t see a way to get to a good outcome except for an aligned project to grab a decisive strategic advantage” to filter into public discourse: it pattern-matches too well to “trust us, you need to let us run the universe”.