Superintelligence 14: Motivation selection methods

This is part of a weekly reading group on Nick Bostrom’s book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI’s reading guide.


Welcome. This week we discuss the fourteenth section in the reading guide: Motivation selection methods. This corresponds to the second part of Chapter Nine.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Motivation selection methods” and “Synopsis” from Chapter 9.


Summary

  1. One way to control an AI is to design its motives. That is, to choose what it wants to do (p138)

  2. Some varieties of ‘motivation selection’ for AI safety:

    1. Direct specification: figure out what we value, and code it into the AI (p139-40)

      1. Isaac Asimov’s ‘three laws of robotics’ are a famous example

      2. Direct specification might be fairly hard: both figuring out what we want and coding it precisely seem hard

      3. This could be based on rules, or something like consequentialism

    2. Domesticity: the AI’s goals limit the range of things it wants to interfere with (140-1)

      1. This might make direct specification easier, as the world the AI interacts with (and thus which has to be thought of in specifying its behavior) is simpler.

      2. Oracles are an example

      3. This might be combined well with physical containment: the AI could be trapped, and also not want to escape.

    3. Indirect normativity: instead of specifying what we value, specify a way to specify what we value (141-2)

      1. e.g. extrapolate our volition

      2. This means outsourcing the hard intellectual work to the AI

      3. This will mostly be discussed in chapter 13 (weeks 23-5 here)

    4. Augmentation: begin with a creature with desirable motives, then make it smarter, instead of designing good motives from scratch. (p142)

      1. e.g. brain emulations are likely to have human desires (at least at the start)

      2. Whether we use this method depends on the kind of AI that is developed, so usually we won’t have a choice about whether to use it (except inasmuch as we have a choice about e.g. whether to develop uploads or synthetic AI first).

  3. Bostrom provides a summary of the chapter:

  4. The question is not which control method is best, but rather which set of control methods are best given the situation. (143-4)

Another view

Icelizarrd:

Would you say there’s any ethical issue involved with imposing limits or constraints on a superintelligence’s drives/​motivations? By analogy, I think most of us have the moral intuition that technologically interfering with an unborn human’s inherent desires and motivations would be questionable or wrong, supposing that were even possible. That is, say we could genetically modify a subset of humanity to be cheerful slaves; that seems like a pretty morally unsavory prospect. What makes engineering a superintelligence specifically to serve humanity less unsavory?

Notes

1. Bostrom tells us that it is very hard to specify human values. We have seen examples of galaxies full of paperclips or fake smiles resulting from poor specification. But these—and Isaac Asimov’s stories—seem to tell us only that a few people spending a small fraction of their time thinking does not produce any watertight specification. What if a thousand researchers spent a decade on it? Are the millionth most obvious attempts at specification nearly as bad as the most obvious twenty? How hard is it? A general argument for pessimism is the thesis that ‘value is fragile’, i.e. that if you specify what you want very nearly but get it a tiny bit wrong, it’s likely to be almost worthless. Much like if you get one digit wrong in a phone number. The degree to which this is so (with respect to value, not phone numbers) is controversial. I encourage you to try to specify a world you would be happy with (to see how hard it is, or produce something of value if it isn’t that hard).

2. If you’d like a taste of indirect normativity before the chapter on it, the LessWrong wiki page on coherent extrapolated volition links to a bunch of sources.

3. The idea of ‘indirect normativity’ (i.e. outsourcing the problem of specifying what an AI should do, by giving it some good instructions for figuring out what you value) brings up the general question of just what an AI needs to be given to be able to figure out how to carry out our will. An obvious contender is a lot of information about human values. Though some people disagree with this—these people don’t buy the orthogonality thesis. Other issues sometimes suggested to need working out ahead of outsourcing everything to AIs include decision theory, priors, anthropics, feelings about pascal’s mugging, and attitudes to infinity. MIRI’s technical work often fits into this category.

4. Danaher’s last post on Superintelligence (so far) is on motivation selection. It mostly summarizes and clarifies the chapter, so is mostly good if you’d like to think about the question some more with a slightly different framing. He also previously considered the difficulty of specifying human values in The golem genie and unfriendly AI (parts one and two), which is about Intelligence Explosion and Machine Ethics.

5. Brian Clegg thinks Bostrom should have discussed Asimov’s stories at greater length:

I think it’s a shame that Bostrom doesn’t make more use of science fiction to give examples of how people have already thought about these issues – he gives only half a page to Asimov and the three laws of robotics (and how Asimov then spends most of his time showing how they’d go wrong), but that’s about it. Yet there has been a lot of thought and dare I say it, a lot more readability than you typically get in a textbook, put into the issues in science fiction than is being allowed for, and it would have been worthy of a chapter in its own right.

If you haven’t already, you might consider (sort-of) following his advice, and reading some science fiction.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser’s list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Can you think of novel methods of specifying the values of one or many humans?

  2. What are the most promising methods for ‘domesticating’ an AI? (i.e. constraining it to only care about a small part of the world, and not want to interfere with the larger world to optimize that smaller part).

  3. Think more carefully about the likely motivations of drastically augmenting brain emulations

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will start to talk about a variety of more and less agent-like AIs: ‘oracles’, genies’ and ‘sovereigns’. To prepare, read Chapter “Oracles” and “Genies and Sovereigns” from Chapter 10. The discussion will go live at 6pm Pacific time next Monday 22nd December. Sign up to be notified here.