AGI and Friendly AI in the dominant AI textbook

AI: A Modern Approach is by far the dominant textbook in the field. It is used in 1200 universities, and is the 25th most-cited publication in computer science. If you’re going to learn AI, this is how you learn it.

Luckily, the concepts of AGI and Friendly AI get pretty good treatment in the 3rd edition, released in 2009.

The Singularity is mentioned in the first chapter on page 12. Both AGI and Friendly AI are also mentioned in the first chapter, on page 27:

[Many leaders in the field] believe AI should return to its roots of striving for, in Simon’s words, “machines that think, that learn and that create.” They call the effort human-level AI or HLAI: their first symposium was in 2004 (Minsky et al. 2004)...

A related idea is the subfield of Artificial General Intelligence or AGI (Goertzel and Pennachin, 2007), which held its first conference and organized the Journal of Artificial General Intelligence in 2008. AGI looks for a universal algorithm for learning and acting in any environment, and has its roots in the work of Ray Solomonoff (1965), one of the attendees of the original 1956 Dartmouth conference. Guaranteeing that what we create is really Friendly AI is also a concern (Yudkowsky, 2008; Omohundro, 2008), one we will return to in Chapter 26.

Chapter 26 is about the philosophy AI, and section 26.3 is “The Ethics and Risks of Developing Artificial Intelligence.” They are:

People might lose their jobs to automation.
People might have too much (or too little) leisure time.
People might lose their sense of being unique.
AI systems might be used toward undesirable ends.
The use of AI systems might result in a loss of accountability.

Each of those sections is one or two paragraphs long. The final risk of AI takes up 3.5 pages: (6) The Success of AI might mean the end of the human race. Here’s a snippet:

The question is whether an AI system poses a bigger risk than traditional software. We will look at three sources of risk. First, the AI system’s state estimation may be incorrect, causing it to do the wrong thing. For example… a missile defense system might erroneously detect an attack and launch a counterattack, leading to the death of billions...

Second, specifying the right utility function for an AI system to maximize is not so easy. For example, we might propose a utility function designed to minimize human suffering, expressed as an additive reward function over time as in Chapter 17. Given the way humans are, however, we’ll always find a way to suffer even in paradise; so the optimal decision for the AI system is to terminate the human race as soon as possible—no humans, no suffering...

Third, the AI system’s learning function may cause it to evolve into a system with unintended behavior. This scenario is the most serious, and is unique to AI systems, so we will cover it in more depth. I.J. Good wrote (1965),

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then be unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control. The “intelligence explosion” has also been called the technological singularity by… Vernor Vinge...

Then they mention Moravec, Kurzweil, and transhumanism, before returning to a more concerned tone about AI. They cover Asimov’s three laws of robotics, and then:

Yudkowsky (2008) goes into more detail about how to design a Friendly AI. He asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design—to define a mechanism for evolving AI systems under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes. We can’t just give a program a static utility function, because circumstances, and our desired responses to circumstances, change over time. For example, if technology had allowed us to design a super-powerful AI agent in 1800 and endow it with the prevailing morals of the time, it would be fighting today to reestablish slavery and abolish women’s right to vote. On the other hand, if we build an AI agent today and tell it how to evolve its utility function, how can we assure that it won’t read that “Humans think it is moral to kill annoying insects, in part because insect brains are so primitive. But human brains are primitive compared to my powers, so it must be moral for me to kill humans.”

Omohundro (2008) hypothesizes that even an innocuous chess program could pose a risk to society. Similarly, Marvin Minsky once suggested that an AI program designed to solve the Riemann Hypothesis might end up taking over all the resources of Earth to build more powerful supercomputers to help achieve its goal. THe moral is that even if you only want you program to play chess or prove theorems, if you give it the capability to learn and alter itself, you need safeguards.

It’s good this work is getting such mainstream coverage!