Alexandros comments on The Friendly AI Game

Alexandros 16 Mar 2011 12:31 UTC
6 points
So, here’s my pet theory for AI that I’d love to put out of it’s misery: “Don’t do anything your designer wouldn’t approve of”. It’s loosely based on the “Gandi wouldn’t take a pill that would turn him into a murderer” principle.

A possible implementation: Make an emulation of the designer and use it as an isolated component of the AI. Any plan of action has to be submitted for approval to this component before being implemented. This is nicely recursive and rejects plans such as “make a plan of action deceptively complex such that my designer will mistakenly approve it” and “modify my designer so that they approve what I want them to approve”.

There could be an argument about how the designer’s emulation would feel in this situation, but.. torture vs. dust specks! Also, is this a corrupted version of ?
What links here?
- purpleposeidon's comment on The Friendly AI Game by bentarm (17 Mar 2011 5:56 UTC; 3 points)
- jimrandomh 16 Mar 2011 14:41 UTC
  18 points
  Parent
  You flick the switch, and find out that you are a component of the AI, now doomed to an unhappy eternity of answering stupid questions from the rest of the AI.
  - Alexandros 16 Mar 2011 14:49 UTC
    11 points
    Parent
    This is a problem. But if this is the only problem, then it is significantly better than paperclip universe.
  - purpleposeidon 16 Mar 2011 23:06 UTC
    3 points
    Parent
    I’m sure the designer would approve of being modified to enjoy answering stupid questions. The designer might also approve of being cloned for the purpose of answering one question, and then being destroyed.
    
    Unfortunately, it turns out that you’re Stalin. Sounds like 1-person CEV.
    - jimrandomh 17 Mar 2011 18:45 UTC
      1 point
      Parent
      
      I’m sure the designer would approve of being modified to enjoy answering stupid questions.
      
      That is or requires a pretty fundamental change. How can you be sure it’s value-preserving?
  - Giles 27 Apr 2011 23:18 UTC
    2 points
    Parent
    I had assumed that a new copy of the designer would be spawned for each decision, and shut down afterwards.
    
    Although thinking about it, that might just doom you to a subjective eternity of listening to the AI explain what it’s done so far, in the anticipation that it’s going to ask you a question at some point.
    
    You’d need a good theory of ems, consciousness and subjective probability to have any idea what you’d subjectively experience.
- [deleted] 16 Mar 2011 22:50 UTC
  9 points
  Parent
  The AI wishes to make ten thousand tiny changes to the world, individually innocuous, but some combination of which add up to catastrophe. To submit its plan to a human, it would need to distill the list of predicted consequences down to its human-comprehensible essentials. The AI that understands which details are morally salient is one that doesn’t need the oversight.
  - ArisKatsaris 17 Mar 2011 10:47 UTC
    1 point
    Parent
    
    The AI that understands which details are morally salient is one that doesn’t need the oversight.
    
    That’s quite non-obvious to me. A quite arbitrary claim, it seems to me.
    
    You’re basically saying if an intelligent mind (A for Alice) knows that person (B for Bob) will care about a certain Consequence C, then A will definitely know how much B will care about it.
    
    This isn’t the case for real human minds. If Alice is a human mechanic and tells to Bob “I can fix your car, but it’ll cost 200$ dollars”, then Alice knows that Bob will care about the cost, but doesn’t know how much Bob will care, and whether Bob prefers to have a fixed car, or to have 200$.
    
    So if your claim doesn’t even hold for human minds, why do you think it applies for non-human minds?
    
    And even if it does hold, what about the case where Alice doesn’t know about whether a detail is morally salient, but errs on the side of caution. e.g. Alice the waitress asks Bob the customer “The chocolate icecream you asked for also has some crushed peanuts in it. Is that okay?”—and Bob can respond “Ofcourse, why should I care about that?” or alternatively “It’s not okay, I’m allergic to peanuts!”
    
    In this case Alice the waitress doesn’t know if the detail is salient to Bob, but asks just to make sure.
  - Alexandros 17 Mar 2011 10:16 UTC
    1 point
    Parent
    This is good, and I have no valid response at this time. Will try to think more about it later.
- Larifari 16 Mar 2011 14:41 UTC
  8 points
  Parent
  If the AI is designed to follow the principle by the letter, it has to request approval from the designer even for the action of requesting approval, leaving the AI incapable of action. If the AI is designed to be able to make certain exemptions, it will figure out a way to modify the designer without needing approval for this modification.
  - Alexandros 16 Mar 2011 14:50 UTC
    6 points
    Parent
    How about making ‘ask for approval’ the only pre-approved action?
- cousin_it 16 Mar 2011 15:13 UTC
  5 points
  Parent
  The AI may stumble upon a plan which contains a sequence of words that hacks the approver’s mind, making him approve pretty much anything. Such plans may even be easier for the AI to generate than plans for saving the world, seeing as Eliezer has won some AI-box experiments but hasn’t yet solved world hunger.
  - Alexandros 16 Mar 2011 15:50 UTC
    2 points
    Parent
    You mean accidentally stumble upon such a sequence of words? Because purposefully building one would certainly not be approved.
    - cousin_it 16 Mar 2011 16:00 UTC
      3 points
      Parent
      Um, does the approver also have to approve each step of the computation that builds the plan to be submitted for approval? Isn’t this infinite regress?
      - Alexandros 16 Mar 2011 16:36 UTC
        2 points
        Parent
        Consider “Ask for approval” as an auto-approved action. Not sure if that solves it, will give this a little more thought.
- Mass_Driver 19 Mar 2011 20:05 UTC
  4 points
  Parent
  The weak link is “plan of action.” What counts as a plan of action? How will you structure the AI so that it knows what a plan is and when to submit it for approval?
- atucker 23 Mar 2011 4:52 UTC
  3 points
  Parent
  Accidentally does something dangerous because the plan is confusing to the designer.
  - Alexandros 23 Mar 2011 7:45 UTC
    0 points
    Parent
    Yeah, this is the plan’s weakness. But what stops such an issue occurring today?
    - atucker 23 Mar 2011 21:24 UTC
      0 points
      Parent
      I think the main difference is that, ideally, people would confirm the rules by which plans are made, rather than the specific details of the plan.
      
      Hopefully the rules would be more understandable.
- Johnicholas 17 Mar 2011 13:37 UTC
  3 points
  Parent
  The AI doesn’t do anything.