Stanford Encyclopedia of Philosophy on AI ethics and superintelligence

Link post

The Stanford Encyclopedia of Philosophy—pretty much the standard reference for surveys of philosophical topics—has a brand-new (“First published Thu Apr 30, 2020) article, “Ethics of Artificial Intelligence and Robotics”. Section 2.10 is called “Singularity”. I think it has a reasonably fair and competent summary of superintelligence discussion:

----

2.10 Singularity

2.10.1 Singularity and Superintelligence

In some quarters, the aim of current AI is thought to be an “artificial general intelligence” (AGI), contrasted to a technical or “narrow” AI. AGI is usually distinguished from traditional notions of AI as a general purpose system, and from Searle’s notion of “strong AI”:

computers given the right programs can be literally said to understand and have other cognitive states. (Searle 1980: 417)

The idea of singularity is that if the trajectory of artificial intelligence reaches up to systems that have a human level of intelligence, then these systems would themselves have the ability to develop AI systems that surpass the human level of intelligence, i.e., they are “superintelligent” (see below). Such superintelligent AI systems would quickly self-improve or develop even more intelligent systems. This sharp turn of events after reaching superintelligent AI is the “singularity” from which the development of AI is out of human control and hard to predict (Kurzweil 2005: 487).

The fear that “the robots we created will take over the world” had captured human imagination even before there were computers (e.g., Butler 1863) and is the central theme in Čapek’s famous play that introduced the word “robot” (Čapek 1920). This fear was first formulated as a possible trajectory of existing AI into an “intelligence explosion” by Irvin Good:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion”, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control. (Good 1965: 33)

The optimistic argument from acceleration to singularity is spelled out by Kurzweil (1999, 2005, 2012) who essentially points out that computing power has been increasing exponentially, i.e., doubling ca. every 2 years since 1970 in accordance with “Moore’s Law” on the number of transistors, and will continue to do so for some time in the future. He predicted in (Kurzweil 1999) that by 2010 supercomputers will reach human computation capacity, by 2030 “mind uploading” will be possible, and by 2045 the “singularity” will occur. Kurzweil talks about an increase in computing power that can be purchased at a given cost—but of course in recent years the funds available to AI companies have also increased enormously: Amodei and Hernandez (2018 [OIR]) thus estimate that in the years 2012–2018 the actual computing power available to train a particular AI system doubled every 3.4 months, resulting in an 300,000x increase—not the 7x increase that doubling every two years would have created.

A common version of this argument (Chalmers 2010) talks about an increase in “intelligence” of the AI system (rather than raw computing power), but the crucial point of “singularity” remains the one where further development of AI is taken over by AI systems and accelerates beyond human level. Bostrom (2014) explains in some detail what would happen at that point and what the risks for humanity are. The discussion is summarised in Eden et al. (2012); Armstrong (2014); Shanahan (2015). There are possible paths to superintelligence other than computing power increase, e.g., the complete emulation of the human brain on a computer (Kurzweil 2012; Sandberg 2013), biological paths, or networks and organisations (Bostrom 2014: 22–51).

Despite obvious weaknesses in the identification of “intelligence” with processing power, Kurzweil seems right that humans tend to underestimate the power of exponential growth. Mini-test: If you walked in steps in such a way that each step is double the previous, starting with a step of one metre, how far would you get with 30 steps? (answer: to Earth’s only permanent natural satellite.) Indeed, most progress in AI is readily attributable to the availability of processors that are faster by degrees of magnitude, larger storage, and higher investment (Müller 2018). The actual acceleration and its speeds are discussed in (Müller and Bostrom 2016; Bostrom, Dafoe, and Flynn forthcoming); Sandberg (2019) argues that progress will continue for some time.

The participants in this debate are united by being technophiles in the sense that they expect technology to develop rapidly and bring broadly welcome changes—but beyond that, they divide into those who focus on benefits (e.g., Kurzweil) and those who focus on risks (e.g., Bostrom). Both camps sympathise with “transhuman” views of survival for humankind in a different physical form, e.g., uploaded on a computer (Moravec 1990, 1998; Bostrom 2003a, 2003c). They also consider the prospects of “human enhancement” in various respects, including intelligence—often called “IA” (intelligence augmentation). It may be that future AI will be used for human enhancement, or will contribute further to the dissolution of the neatly defined human single person. Robin Hanson provides detailed speculation about what will happen economically in case human “brain emulation” enables truly intelligent robots or “ems” (Hanson 2016).

The argument from superintelligence to risk requires the assumption that superintelligence does not imply benevolence—contrary to Kantian traditions in ethics that have argued higher levels of rationality or intelligence would go along with a better understanding of what is moral and better ability to act morally (Gewirth 1978; Chalmers 2010: 36f). Arguments for risk from superintelligence say that rationality and morality are entirely independent dimensions—this is sometimes explicitly argued for as an “orthogonality thesis” (Bostrom 2012; Armstrong 2013; Bostrom 2014: 105–109).

Criticism of the singularity narrative has been raised from various angles. Kurzweil and Bostrom seem to assume that intelligence is a one-dimensional property and that the set of intelligent agents is totally-ordered in the mathematical sense—but neither discusses intelligence at any length in their books. Generally, it is fair to say that despite some efforts, the assumptions made in the powerful narrative of superintelligence and singularity have not been investigated in detail. One question is whether such a singularity will ever occur—it may be conceptually impossible, practically impossible or may just not happen because of contingent events, including people actively preventing it. Philosophically, the interesting question is whether singularity is just a “myth” (Floridi 2016; Ganascia 2017), and not on the trajectory of actual AI research. This is something that practitioners often assume (e.g., Brooks 2017 [OIR]). They may do so because they fear the public relations backlash, because they overestimate the practical problems, or because they have good reasons to think that superintelligence is an unlikely outcome of current AI research (Müller forthcoming-a). This discussion raises the question whether the concern about “singularity” is just a narrative about fictional AI based on human fears. But even if one does find negative reasons compelling and the singularity not likely to occur, there is still a significant possibility that one may turn out to be wrong. Philosophy is not on the “secure path of a science” (Kant 1791: B15), and maybe AI and robotics aren’t either (Müller 2020). So, it appears that discussing the very high-impact risk of singularity has justification even if one thinks the probability of such singularity ever occurring is very low.

2.10.2 Existential Risk from Superintelligence

Thinking about superintelligence in the long term raises the question whether superintelligence may lead to the extinction of the human species, which is called an “existential risk” (or XRisk): The superintelligent systems may well have preferences that conflict with the existence of humans on Earth, and may thus decide to end that existence—and given their superior intelligence, they will have the power to do so (or they may happen to end it because they do not really care).

Thinking in the long term is the crucial feature of this literature. Whether the singularity (or another catastrophic event) occurs in 30 or 300 or 3000 years does not really matter (Baum et al. 2019). Perhaps there is even an astronomical pattern such that an intelligent species is bound to discover AI at some point, and thus bring about its own demise. Such a “great filter” would contribute to the explanation of the “Fermi paradox” why there is no sign of life in the known universe despite the high probability of it emerging. It would be bad news if we found out that the “great filter” is ahead of us, rather than an obstacle that Earth has already passed. These issues are sometimes taken more narrowly to be about human extinction (Bostrom 2013), or more broadly as concerning any large risk for the species (Rees 2018)—of which AI is only one (Häggström 2016; Ord 2020). Bostrom also uses the category of “global catastrophic risk” for risks that are sufficiently high up the two dimensions of “scope” and “severity” (Bostrom and Ćirković 2011; Bostrom 2013).

These discussions of risk are usually not connected to the general problem of ethics under risk (e.g., Hansson 2013, 2018). The long-term view has its own methodological challenges but has produced a wide discussion: (Tegmark 2017) focuses on AI and human life “3.0” after singularity while Russell, Dewey, and Tegmark (2015) and Bostrom, Dafoe, and Flynn (forthcoming) survey longer-term policy issues in ethical AI. Several collections of papers have investigated the risks of artificial general intelligence (AGI) and the factors that might make this development more or less risk-laden (Müller 2016b; Callaghan et al. 2017; Yampolskiy 2018), including the development of non-agent AI (Drexler 2019).

2.10.3 Controlling Superintelligence?

In a narrow sense, the “control problem” is how we humans can remain in control of an AI system once it is superintelligent (Bostrom 2014: 127ff). In a wider sense, it is the problem of how we can make sure an AI system will turn out to be positive according to human perception (Russell 2019); this is sometimes called “value alignment”. How easy or hard it is to control a superintelligence depends significantly on the speed of “take-off” to a superintelligent system. This has led to particular attention to systems with self-improvement, such as AlphaZero (Silver et al. 2018).

One aspect of this problem is that we might decide a certain feature is desirable, but then find out that it has unforeseen consequences that are so negative that we would not desire that feature after all. This is the ancient problem of King Midas who wished that all he touched would turn into gold. This problem has been discussed on the occasion of various examples, such as the “paperclip maximiser” (Bostrom 2003b), or the program to optimise chess performance (Omohundro 2014).

Discussions about superintelligence include speculation about omniscient beings, the radical changes on a “latter day”, and the promise of immortality through transcendence of our current bodily form—so sometimes they have clear religious undertones (Capurro 1993; Geraci 2008, 2010; O’Connell 2017: 160ff). These issues also pose a well-known problem of epistemology: Can we know the ways of the omniscient (Danaher 2015)? The usual opponents have already shown up: A characteristic response of an atheist is

People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world (Domingos 2015)

The new nihilists explain that a “techno-hypnosis” through information technologies has now become our main method of distraction from the loss of meaning (Gertz 2018). Both opponents would thus say we need an ethics for the “small” problems that occur with actual AI and robotics (sections 2.1 through 2.9 above), and that there is less need for the “big ethics” of existential risk from AI (section 2.10).