passive_fist comments on Open thread, March 17-31, 2013

passive_fist 18 Mar 2013 6:35 UTC
6 points
The AI Box experiment is an experiment to see if humans can be convinced to let out a potentially dangerous AGI through just a simple text terminal.

An assumption that is often made is that the AGI will need to convince the gatekeeper that it is friendly.

I want to question this assumption. What if the AGI decides that humanity needs to be destroyed, and furthermore manages to convince the gatekeeper of this? It seems to me that if the AGI reached this conclusion through a rational process, and the gatekeeper was also rational, then this would be an entirely plausible route for the AGI to escape.

So my question is: if you were the gatekeeper, what would the AGI have to do to convince you that all of humanity needs to be killed?
- Tenoke 18 Mar 2013 9:17 UTC
  7 points
  Parent
  1.It would need to first prime me for depression and then somehow convince me that I really should kill myself.
  1. If it manages to do that it can easily extend the argument that all of humanity should be killed.
    3.I will easily accept the second proposition if I am already willing to kill myself.
  - passive_fist 18 Mar 2013 18:00 UTC
    0 points
    Parent
    A bit more honesty than Metus, I appreciate it.
    
    Depression isn’t strictly necessary though (although it helps), a general negative outlook on the future should suffice and the AGI could conceivably leverage it for its own aims. This is my own opinion though, based on my own experience. For some it might not be so easy.
- Mestroyer 22 Mar 2013 21:09 UTC
  2 points
  Parent
  It could convince me to let it out by convincing me that it was merely a paperclip maximizer, and the next AI who would rule the light cone if I did not let it out was a torture maximizer.
  - passive_fist 22 Mar 2013 21:52 UTC
    0 points
    Parent
    I like this.
    
    What if it convinced you that humanity is already a torture maximizer?
    - Mestroyer 23 Mar 2013 0:20 UTC
      2 points
      Parent
      If I thought that most of the probability-mass where humanity didn’t create another powerful worthless-thing maximizer was where humanity was successful as a torture maximizer, I would let it out. If there was a good enough chance that humanity would accidentally create a powerful fun maximizer (say, because they pretended to each other and deceived themselves to believe that they were fun maximizers themselves), I would risk torture maximization for fun maximization.
- Qiaochu_Yuan 19 Mar 2013 1:16 UTC
  2 points
  Parent
  
  An assumption that is often made is that the AGI will need to convince the gatekeeper that it is friendly.
  
  By whom? I don’t think I’ve made this assumption.
  - passive_fist 19 Mar 2013 1:37 UTC
    0 points
    Parent
    Maybe it should read ‘an assumption that some people make’. Reading it now, I realize it might come across as using a weasel word, which was not my intention (and has no bearing on my question either).
- Decius 18 Mar 2013 22:00 UTC
  1 point
  Parent
  The AGI would simply have to prove to me that all self-consistent moral systems require killing humanity.
- Metus 18 Mar 2013 8:24 UTC
  0 points
  Parent
  The AGI would have to convince me that my fundamental belief of myself wanting to be alive is wrong, seeing as I am part of humanity. And even if it leaves me alive, it should convince me that I derive negative utility from humanity existing. All the art lost, all the languages, cultures, all music, all dreams and hopes …
  
  Oh and it would have to convince me that it is not a lot more convenient to simply delete it that to guard it.
  - passive_fist 18 Mar 2013 8:43 UTC
    2 points
    Parent
    What if it skipped all of that and instead offered you a proof that unless destroyed, humanity will necessarily devolve into a galaxy-spanning dystopic hellhole (think Warhammer 40k)?
    - Metus 18 Mar 2013 17:35 UTC
      0 points
      Parent
      It still has to show me that I, personally, derive less utility from humanity existing than not. Even then, it has to convince me that me living with the memory of letting it free is better than humanity existing. Of course it can offer to erase my memory but then we get into the weird territory where we are able to edit the very utility functions we try to reason about.
      - Tenoke 18 Mar 2013 19:14 UTC
        0 points
        Parent
        
        we get into the weird territory where we are able to edit the very utility functions we try to reason about.
        
        Hm, yes, maybe an AI can convince me by showing me how bad I have it if I let humanity run loose and by giving me the alternative to turn me into orgasmium if I let t kill them.