Eliezer Yudkowsky comments on Decision Theory FAQ

Eliezer Yudkowsky 13 Mar 2013 20:59 UTC
16 points
“Aargh!” he said out loud in real life. David, are you disagreeing with me here or do you honestly not understand what I’m getting at?

The whole idea is that an agent can fully understand, model, predict, manipulate, and derive all relevant facts that could affect which actions lead to how many paperclips, regarding happiness, without having a pleasure-pain architecture. I don’t have a paperclipping architecture but this doesn’t stop me from modeling and understanding paperclipping architectures.

The paperclipper can model and predict an agent (you) that (a) operates on a pleasure-pain architecture and (b) has a self-model consisting of introspectively opaque elements which actually contain internally coded instructions for your brain to experience or want certain things (e.g. happiness). The paperclipper can fully understand how your workspace is modeling happiness and know exactly how much you would want happiness and why you write papers about the apparent ineffability of happiness, without being happy itself or at all sympathetic toward you. It will experience no future surprise on comprehending these things, because it already knows them. It doesn’t have any object-level brain circuits that can carry out the introspectively opaque instructions-to-David’s-brain that your own qualia encode, so it has never “experienced” what you “experience”. You could somewhat arbitrarily define this as a lack of knowledge, in defiance of the usual correspondence theory of truth, and despite the usual idea that knowledge is being able to narrow down possible states of the universe. In which case, symmetrically under this odd definition, you will never be said to “know” what it feels like to be a sentient paperclip maximizer or you would yourself be compelled to make paperclips above all else, for that is the internal instruction of that quale.

But if you take knowledge in the powerful-intelligence-relevant sense where to accurately represent the universe is to narrow down its possible states under some correspondence theory of truth, and to well model is to be able to efficiently predict, then I am not barred from understanding how the paperclip maximizer works by virtue of not having any internal instructions which tell me to only make paperclips, and it’s not barred by its lack of pleasure-pain architecture from fully representing and efficiently reasoning about the exact cognitive architecture which makes you want to be happy and write sentences about the ineffable compellingness of happiness. There is nothing left for it to understand. This is also the only sort of “knowledge” or “understanding” that would inevitably be implied by Bayesian updating. So inventing a more exotic definition of “knowledge” which requires having completely modified your entire cognitive architecture just so that you can natively and non-sandboxed-ly obey the introspectively-opaque brain-instructions aka qualia of another agent with completely different goals, is not the sort of predictive knowledge you get just by running a powerful self-improving agent trying to better manipulate the world. You can’t say, “But it will surely discover...”

I know that when you imagine this it feels like the paperclipper doesn’t truly know happiness, but that’s because, as an act of imagination, you’re imagining the paperclipper without that introspectively-opaque brain-instructing model-element that you model as happiness, the modeled memory of which is your model of what “knowing happiness” feels like. And because the actual content and interpretation of these brain-instructions are introspectively opaque to you, you can’t imagine anything except the quale itself that you imagine to constitute understanding of the quale, just as you can’t imagine any configuration of mere atoms that seem to add up to a quale within your mental workspace. That’s why people write papers about the hard problem of consciousness in the first place.

Even if you don’t believe my exact account of the details, someone ought to be able to imagine that something like this, as soon as you actually knew how things were made of parts and could fully diagram out exactly what was going on in your own mind when you talked about happiness, would be true—that you would be able to efficiently manipulate models of it and predict anything predictable, without having the same cognitive architecture yourself, because you could break it into pieces and model the pieces. And if you can’t fully credit that, you at least shouldn’t be confident that it doesn’t work that way, when you know you don’t know why happiness feels so ineffably compelling!
- Kawoomba 14 Mar 2013 17:29 UTC
  4 points
  Parent
  Here comes the Reasoning Inquisition! (Nobody expects the Reasoning Inquisition.)
  
  As the defendant admits, a sufficiently leveled-up paperclipper can model lower-complexity agents with a negligible margin of error.
  
  That means that we can define a subroutine within the paperclipper which is functionally isomorphic to that agent.
  
  If the agent-to-be-modelled is experiencing pain and pleasure, then by the defendent’s own rejection of the likely existence of p-zombies, so must that subroutine of the paperclipper! Hence a part of the paperclipper experiences pain and pleasure. I submit that this can be used as pars pro toto, since it is no different from only a part of the human brain generating pain and pleasure, yet us commonly referring to “the human” experiencing thus.
  
  That the aforementioned feelings of pleasure and pain are not directly used to guide the (umbrella) agent’s actions is of no consequence, the feeling exists nonetheless.
  
  The power of this revelation is strong, here come the tongues! tại sao bạn dịch! これは喜劇の効果にすぎず! یہ اپنے براؤزر پلگ ان کی امتحان ہے، بھی ہے.
  What links here?
  - khafra's comment on Decision Theory FAQ by lukeprog (15 Mar 2013 19:19 UTC; 10 points)
  - Eliezer Yudkowsky 14 Mar 2013 17:37 UTC
    9 points
    Parent
    
    That means that we can define a subroutine within the paperclipper which is functionally isomorphic to that agent.
    
    Not necessarily. x → 0 is input-output isomorphic to Goodstein() without being causally isomorphic. There are such things as simplifications.
    
    If the agent-to-be-modelled is experiencing pain and pleasure, then by the defendent’s own rejection of the likely existence of p-zombies, so must that subroutine of the paperclipper!
    
    Quite likely. A paperclipper has no reason to avoid sentient predictive routines via a nonperson predicate; that’s only an FAI desideratum.
  - whowhowho 14 Mar 2013 18:46 UTC
    0 points
    Parent
    A subroutine, or any other simulation or model, isn’t a p-zombie as usually defined, since they are physical duplicates. A sim is a functional equivalent (for some value of “equivalent”) made of completely different stuff, or no particular kind of stuff.
    - Kawoomba 14 Mar 2013 18:52 UTC
      0 points
      Parent
      I wrote a lengthy comment on just that, but scrapped it because it became rambling.
      
      An outsider could indeed tell them apart by scanning for exact structural correspondence, but that seems like cheating. Peering beyond the veil / opening Clippy’s box is not allowed in a Turing test scenario, let’s define some p-zombie-ish test following the same template. If it quales like a duck (etc.), it probably is sufficiently duck-like.
      - whowhowho 14 Mar 2013 19:04 UTC
        0 points
        Parent
        I would rather maintain p-zombie in its usual meaning, and introduce a new term, eg c-zombie for Turing-indistiguishable functional duplicates.
- Sarokrae 13 Mar 2013 21:07 UTC
  2 points
  Parent
  
  I don’t have a paperclipping architecture but this doesn’t stop me from imagining paperclipping architectures.
  
  So my understanding of David’s view (and please correct me if I’m wrong, David, since I don’t wish to misrepresent you!) is that he doesn’t have paperclipping architecture and this does stop him from imagining paperclipping architectures.
  - Eliezer Yudkowsky 13 Mar 2013 21:19 UTC
    4 points
    Parent
    ...well, in point of fact he does seem to be having some trouble, but I don’t think it’s fundamental trouble.
- whowhowho 14 Mar 2013 17:09 UTC
  −2 points
  Parent
  
  The whole idea is that an agent can fully understand, model, predict, manipulate, and derive all relevant facts that could affect which actions lead to how many paperclips, regarding happiness, without having a pleasure-pain architecture.
  
  Let’s say the paperclipper reaches the point where it considers making people suffer for the sake of paperclipping. DP’s point seems to be that either it fully understands suffering—in which case, it realies that inflicing suffering is wrong—or it it doesn’t fully understand. He sees a conflict between superintelligence and ruthlessness—as a moral realist/cognitivist would
  
  he paperclipper can fully understand how your workspace is modeling happiness and know exactly how much you would want happiness and why you write papers about the apparent ineffability of happiness, without being happy itself or at all sympathetic toward you
  
  is that full understanding.?.
  
  But if you take knowledge in the powerful-intelligence-relevant sense where to accurately represent the universe is to narrow down its possible states under some correspondence theory of truth, and to well model is to be able to efficiently predict, then I am not barred from understanding how the paperclip maximizer works by virtue of not having any internal instructions which tell me to only make paperclips, and it’s not barred by its lack of pleasure-pain architecture from fully representing and efficiently reasoning about the exact cognitive architecture which makes you want to be happy and write sentences about the ineffable compellingness of happiness. There is nothing left for it to understand.
  
  ETA: Unless there is—eg. what qualiaphiles are always banging on about; what it feels like. That the clipper can conjectures that are true by correspondence , that it can narrow down possible universes, that it can predict, are all necessary criteria for full understanding. It is not clear that they are sufficient. Clippy may be able to figure out an organisms response to pain on a basis of “stimulus A produces response B”, but is that enough to tell it that pain hurts ? (We can make guesses about that sort of thing in non-human organisms, but that may be more to do with our own familiarity with pain, and less to do with acts of superintelligence). And if Clippy can’t know that pain hurts, would Clippy be able to work out that Hurting People is Wrong?
  
  further edit; To put it another way, what is there to be moral about in a qualia-free universe?
  - khafra 15 Mar 2013 19:19 UTC
    10 points
    Parent
    As Kawoomba colorfully pointed out, clippy’s subroutines simulating humans suffering may be fully sentient. However, unless those subroutines have privileged access to clippy’s motor outputs or planning algorithms, clippy will go on acting as if he didn’t care about suffering. He may even understand that inflicting suffering is morally wrong—but this will not make him avoid suffering, any more than a thrown rock with “suffering is wrong” painted on it will change direction to avoid someone’s head. Moral wrongness is simply not a consideration that has the power to move a paperclip maximizer.
    - whowhowho 16 Mar 2013 20:18 UTC
      −5 points
      Parent
      
      Moral wrongness is simply not a consideration that has the power to move a paperclip maximizer.
      
      That is construed and constructed a certain way. The counterargument makes other assumptions.