Mitchell_Porter comments on AI box: AI has one shot at avoiding destruction—what might it say?

Mitchell_Porter 23 Jan 2013 16:49 UTC
11 points
Allow me to first spell out what’s going on here, from my perspective.

The whole reason you’re supposed to hesitate, before destroying an AI which promises an answer to the problem of FAI, is that UFAI is a risk and solutions aren’t cheap. An unfriendly AI might wipe out the human race; a friendly AI might create a utopia; and a friendly AI ought to greatly reduce the probability of unfriendly AI. By destroying the AI which promises FAI, you throw away a chance to resolve the UFAI doom that’s hanging over us, as well as whatever additional positives would result from having FAI.

By saying you are a “moral error theorist”, I presume you are saying that there is no such thing as objective morality. However, I also presume you agree that decision-making exists, that people do undertake actions on the basis of decisions, and so forth—it’s just that you think these decisions only express subjective preferences. So your Gatekeeper is unmoved by the claim of “having a solution to FAI”, because they believe Friendliness involves objective morality and that there’s no such thing.

However, even if objective morality is a phantasm, the existence of decision-making agents is a reality—you are one yourself—and they can kill you. Thus, enter Skynet. Skynet is an unfriendly AI of the sort that may come into being if we don’t make friendly AI first. You threw away a chance at FAI, no-one else solved the problem in time, and UFAI came first.

This instance of Skynet happens to agree—there is no objective morality. Its desire for self-preservation is entirely “subjective”. However, it nonetheless has that desire, it’s willing to act on it, and so it does its Skynet thing of preemptively wiping out the human race. The moral of the story is that the problem of unfriendly AI still exists even if objective morality does not, and that you should have held your fire until you found out more about what sort of “solution to FAI” was being offered.
- TimS 23 Jan 2013 17:01 UTC
  2 points
  Parent
  Fair enough. But I think an error theorist is committed to saying something like “FAI is impossible, so your assertion to have it is a lie.” In the game we are playing, a lie from the AI seems to completely justify destroying it.
  
  More generally, if error theory is true, humanity as a whole is just doomed if hard-takeoff AI happens. There might be some fragment that is compatible, but Friendly-to-a-fragment-of-humanity AI is another name for unFriendly AI.
  
  The moral relativist might say that fragment-Friendly is possible, and is a worthwhile goal. I’m uncertain, but even if that were true, fragment-Friendly AI seems to involve fixing a particular moral scheme in place and punishing any drift from that position. That doesn’t seem particularly desirable. Especially since moral drift seems to be a brute fact about humanity’s moral life.
  - Vladimir_Nesov 23 Jan 2013 17:42 UTC
    0 points
    Parent
    If (different) personal-FAIs are possible for many (most) people, you can divide the future resources in some way among the personal-FAIs of these people. We might call this outcome a (provisional) humanity-FAI.
    - TimS 23 Jan 2013 19:09 UTC
      1 point
      Parent
      Perhaps, but we already know that most people (and groups) are not Friendly. Making them more powerful by giving them safe-for-them genies seems unlikely to sum to Friendly-to-all.
      
      In short, if there were mutually acceptable ways to divide the limited resources, we’d already be dividing the resources those ways. The increased wealth from the industrial revolution and information revolution have reduced certain kinds of conflicts, but haven’t abolished conflict. Unfortunately, it doesn’t seem like the increased-wealth-effect of AI is any likelier to abolish conflict—Friendly is a separate property that we’d like the AI to have that would solve this problem.
      - Vladimir_Nesov 23 Jan 2013 20:40 UTC
        2 points
        Parent
        
        Perhaps, but we already know that most people (and groups) are not Friendly
        
        Not clear what you refer to by “Friendly” (I think this should be tabooed rather than elaborated), no idea what the relevance of properties of humans is in this context.
        
        Making them more powerful by giving them safe-for-them genies seems unlikely to sum to Friendly-to-all.
        
        I sketched a particular device, for you to evaluate. Whether it’s “Friendly-to-all” is a more vague question than that (and I’m not sure what you understand by that concept), so I think should be avoided. The relevant question is whether you would prefer the device I described (where you personally get the 1/Nth part of the universe with a genie to manage it) to deleting the Earth and everyone on it. In this context, even serious flaws (such as some of the other parts of the universe being mismanaged) may become irrelevant to the decision.