Rob Bensinger comments on The genie knows, but doesn’t care

Rob Bensinger 6 Sep 2013 1:46 UTC
13 points
I considered these three options above:
- C. direct normativity—program the AI to value what we value.
- B. indirect normativity—program the AI to value figuring out what our values are and then valuing those things.
- A. indirect indirect normativity—program the AI to value doing whatever we tell it to, and then tell it, in English, “Value figuring out what our values are and then valuing those things.”
I can see why you might consider A superior to C. I’m having a harder time seeing how A could be superior to B. I’m not sure why you say “Doing that has many potential pitfalls. because it is a formal specification.” (Suppose we could make an artificial superintelligence that thinks ‘informally’. What specifically would this improve, safety-wise?)

Regardless, the AI thinks in math. If you tell it to interpret your phonemes, rather than coding your meaning into its brain yourself, that doesn’t mean you’ll get an informal representation. You’ll just get a formal one that’s reconstructed by the AI itself.

It’s not clear to me that programming a seed to understand our commands (and then commanding it to become Friendlier) is easier than just programming it to become Friendlier, but in any case the processes are the same after the first stage. That is, A is the same as B but with a little extra added to the beginning, and it’s not clear to me why that little extra language-use stage is supposed to add any safety. Why wouldn’t it just add one more stage at which something can go wrong?
- PhilGoetz 6 Sep 2013 4:31 UTC
  2 points
  Parent
  
  Regardless, the AI thinks in math. If you tell it to interpret your phonemes, rather than coding your meaning into its brain yourself, that doesn’t mean you’ll get an informal representation. You’ll just get a formal one that’s reconstructed by the AI itself.
  
  It is misleading to say that an interpreted language is formal because the C compiler is formal. Existence proof: Human language. I presume you think the hardware that runs the human mind has a formal specification. That hardware runs the interpreter of human language. You could argue that English therefore is formal, and indeed it is, in exactly the sense that biology is formal because of physics: technically true, but misleading.
  
  This will boil down to a semantic argument about what “formal” means. Now, I don’t think that human minds—or computer programs—are “formal”. A formal process is not Turing complete. Formalization means modeling a process so that you can predict or place bounds on its results without actually simulating it. That’s what we mean by formal in practice. Formal systems are systems in which you can construct proofs. Turing-complete systems are ones where some things cannot be proven. If somebody talks about “formal methods” of programming, they don’t mean programming with a language that has a formal definition. They mean programming in a way that lets you provably verify certain things about the program without running the program. The halting problem implies that for a programming language to allow you to verify even that the program will terminate, your language may no longer be Turing-complete.
  
  Eliezer’s approach to FAI is inherently formal in this sense, because he wants to be able to prove that an AI will or will not do certain things. That means he can’t avail himself of the full computational complexity of whatever language he’s programming in.
  
  But I’m digressing from the more-important distinction, which is one of degree and of connotation. The words “formal system” always go along with computational systems that are extremely brittle, and that usually collapse completely with the introduction of a single mistake, such as a resolution theorem prover that can prove any falsehood if given one false belief. You may be able to argue your way around the semantics of “formal” to say this is not necessarily the case, but as a general principle, when designing a representational or computational system, fault-tolerance and robustness to noise are at odds with the simplicity of design and small number of interactions that make proving things easy and useful.
  - Rob Bensinger 6 Sep 2013 19:12 UTC
    4 points
    Parent
    That all makes sense, but I’m missing the link between the above understanding of ‘formal’ and these four claims, if they’re what you were trying to say before:
    
    (1) Indirect indirect normativity is less formal, in the relevant sense, than indirect normativity. I.e., because we’re incorporating more of human natural language into the AI’s decision-making, the reasoning system will be more tolerant of local errors, uncertainty, and noise.
    
    (2) Programming an AI to value humans’ True Preferences in general (indirect normativity) has many pitfalls that programming an AI to value humans’ instructions’ True Meanings in general (indirect indirect normativity) doesn’t, because the former is more formal.
    
    (3) “‘Tell the AI in English’ can fail, but the worst case is closer to a ‘With Folded Hands’ scenario than to paperclips.”
    
    (4) The “With Folded Hands”-style scenario I have in mind is not as terrible as the paperclips scenario.
  - Polymeron 6 Sep 2013 4:41 UTC
    2 points
    Parent
    Wouldn’t this only be correct if similar hardware ran the software the same way? Human thinking is highly associative and variable, and as language is shared amongst many humans, it means that it doesn’t, as such, have a fixed formal representation.
  - [deleted] 6 Sep 2013 20:56 UTC
    −7 points
    Parent
    Phil,
    
    You are a rational and reasonable person. Why not speak up about what is happening here? Rob is making a spirited defense of his essay, over on his blog, and I have just posted a detailed critique that really nails down the core of the argument that is supposed to be happening here.
    
    And yet, if you look closely you will find that all of my comments—be they as neutral, as sensible or as rational as they can be—are receiving negative votes so fast that they are disappearing to the bottom of the stack or being suppressed completely.
    
    What a bizarre situation!! This article that RobbBB submitted to LessWrong is supposed to be ABOUT my own article on the IEET website. My article is the actual TOPIC here! And yet I, the author of that article, have been insulted here by Eliezer Yudkowsky, and my comments suppressed. Amazing, don’t you think?
    - Rob Bensinger 6 Sep 2013 22:39 UTC
      14 points
      Parent
      Richard: On LessWrong, comments are sorted by how many thumbs up and thumbs down they get, because it makes it easier to find the most popular posts quickly. If a post gets −4 points or lower, it gets compressed to make room for more popular posts, and to discourage flame wars. (You can still un-compress it by just clicking the + in the upper right corner of the comment.) At the moment, some of Eliezer’s comments and yours have both been down-voted and compressed in this way, presumably because people on the site thought the personal attacks weren’t useful for the conversation as a whole.
      
      People are probably also down-voting your comments because they’re histrionic and don’t reflect an understanding of this forum’s mechanics. I recommend only making points about the substance of people’s arguments; if you have personal complaints, take it to a private channel so it doesn’t add to the noise surrounding the arguments themselves.