lukeprog comments on Work harder on tabooing “Friendly AI”

lukeprog 20 May 2012 11:14 UTC
16 points
My article forthcoming with Bostrom is too short to resolve the confusions you’re discussing.

What we actually said about Nanny AI is that it may be FAI-complete, and that it is thus really full-blown Friendly AI even though when Ben Goertzel talks about it in English it might sound like not-FAI.

Here’s an example of why “Friendly AI may be incoherent and impossible.” Suppose that the only way to have a superintelligent AI beneficial to humanity is something like CEV, but nobody is ever able to make sense of the idea of combining and extrapolating human values. “Can we extrapolate the coherent convergence of human values?” sounds suspiciously like a Wrong Question. Maybe there’s a Right Question somewhere near that space, and we’ll be able to find the answer, but right now we are fundamentally philosophically confused about what these English words could usefully mean.
- ChrisHallquist 20 May 2012 13:21 UTC
  8 points
  Parent
  
  What we actually said about Nanny AI is that it may be FAI-complete, and that it is thus really full-blown Friendly AI even though when Ben Goertzel talks about it in English it might sound like not-FAI.
  
  It’s worth distinguishing between two claims: (1) If you can build Nanny AI, you can build FAI and (2) If you’ve built Nanny AI, you’ve built FAI.
  
  (2) is compatible with and in fact entails (1). (1) does not, however, entail (2). In fact, (1) seems pointless to say if you also believe (2) because the entailment is so obvious. Because your paper explicitly asserts (1), I inferred you did not believe (2). Your comment seems to explicitly assert both (1) and (2), making me somewhat confused about what your view is.
  
  EDIT: Part of what is confusing about your comment is that it seems to say “(1), thus (2)” which does not follow. Also, to save people the trouble of looking up the relevant section of the paper, the term “FAI complete” is explained in this way: “That is, in order to build Nanny AI, you may need to solve all the problems required to build full-blown Friendly AI.”
  
  Here’s an example of why “Friendly AI may be incoherent and impossible.” Suppose that the only way to have a superintelligent AI beneficial to humanity is something like CEV, but nobody is ever able to make sense of the idea of combining and extrapolating human values. “Can we extrapolate the coherent convergence of human values?” sounds suspiciously like a Wrong Question. Maybe there’s a Right Question somewhere near that space, and we’ll be able to find the answer, but right now we are fundamentally philosophically confused about what these English words could usefully mean.
  
  I’m not sure I understand what you mean by this either. Maybe, going off the “beneficial to humanity” definition of FAI, you mean to say that it’s possible that right now, we are fundamentally philosophically confused about what “beneficial to humanity” might mean?
- TheOtherDave 20 May 2012 17:18 UTC
  6 points
  Parent
  
  “Can we extrapolate the coherent convergence of human values?” sounds suspiciously like a Wrong Question. Maybe there’s a Right Question somewhere near that space, and we’ll be able to find the answer, but right now we are fundamentally philosophically confused about what these English words could usefully mean.
  
  (Dances the Dance of Endorsement )
- John_Maxwell 11 Jun 2012 20:01 UTC
  0 points
  Parent
  I don’t think the confusions are that hard to resolve, although related confusions might be. Here are some distinct questions:
  - Will a given AI’s creation lead to good consequences?
  - To what extent can a given AI be said to have a utility function?
  - How can we define humanity’s utility function?
  - How closely does a given AI’s utility function approximate our definition?
  - Is a given AI’s utility function stable?
  The standard SI position would be something like an AI will only lead to good consequences if we are careful to define humanity’s utility function, get the AI to approximate it extremely closely, and ensure the AI’s utility function is stable, or only moves towards being a better approximation of humanity’s utility function. (I don’t see how that last one could reliably be expected to happen.)