pjeby comments on Raising the Sanity Waterline

pjeby 13 Mar 2009 4:49 UTC
5 points
Yes, the comment system here is really not suited to the kind of conversation I’ve been trying to have… not that I’m sure what system would work for it. ;-)

As far as meta-ethics goes, the short summary is:
1. “Avoiding badness” and “seeking goodness” are not interchangeable when you experience them concretely on human hardware,
2. It is therefore a reasoning error to treat them as if they were interchangeable in your abstract moral calculations (as they will not work the same way in practice),
3. Due to the specific nature of the human hardware biases involved (i.e., the respective emotional, chemical, and neurological responses to pain vs. pleasure) , badness-avoidance values are highly likely to be found irrational upon detailed examination… and thus they are always the ones worth examining first.
4. Badness-avoidance values are a disproportionately high (if not exclusive!) source of “motivated reasoning”. i.e., we don’t so much rationalize to paint pretty pictures, as to hide the ugly ones. (Which makes rooting them out of critical importance to rationalists.)
This summary is more to clarify my thoughts for the eventual post, than an attempt to continue the discussion. (To me, these things are so obvious and so much a part of my day-to-day experience that I often forget the inferential distance involved for most people.)

These ideas are all capable of experimental verification; the first one has certainly been written about in the literature. None are particularly unorthodox or controversial in and of themselves, as far as I’m aware.

However, there are common arguments against some of these ideas that my own students bring up, so in my (eventual) post I’ll need to bring them up and refute them as well.

For example, a common argument against positively-motivated goodness is that feeling good about being generous means you’re “really” being selfish… and thus bad! So, the person advancing this argument is motivated to rationalize the “virtue” of being dutiful—i.e., doing something you don’t want to, but nonetheless “should”—because it would be bad not to.

Strangely, most people have these judgments only in relation to their self… They see no problem with someone else doing good out of generosity or kindness, with no pain or duty involved. It’s only themselves they sentence to this “virtue” of suffering to achieve goodness. (Which is sort of like “fighting for peace” or “f*ing for virginity”, but I digress.)

Whether this is something inbuilt, cultural, or selection bias of people I work with, I have no idea. But it’s damn common… and Eliezer’s making a virtue out of unhappiness (beyond the bare minimums demanded by safety, etc.) fits smack dab in the middle of this territory.

Whew. Okay, I’m going to stop writing this now… this really needs to be a post. Or several. The more I think about how to get here, starting from only the OB corpus and without recapitulating my own, the bigger I realize the inferential gap is.
- Eliezer Yudkowsky 13 Mar 2009 5:48 UTC
  3 points
  Parent
  You may be running into the Reversed Stupidity problem; most cases you’ve seen advocating negative feelings are stupid, therefore, you assume that all such advocations must result from the same stupidity.
  
  I sympathize because I remember back when I would have thought that anyone arguing against the abolitionist program—that is, the total abolition of all suffering—was a Luddite.
  
  But I eventually realized I didn’t want to eliminate my negative reinforcement hardware, and that moreover, I wouldn’t be such a bad person if I, you know, just did things the way I did want, instead of doing things the way I felt vaguely dutifully obligated to want but didn’t want.
  
  Why am I a terrible, bad person for not wanting to modify myself in that way? What higher imperative should override: “I’d rather not do this”?
  - pjeby 13 Mar 2009 6:43 UTC
    4 points
    Parent
    I didn’t say you’re a terrible bad person—I said your choice to be unhappy in the absence of a positive benefit from same, is likely to be found irrational, if you reflect on the concrete emotional reason you find the prospect abhorrent.
    
    I also don’t recommend eliminating the negative reinforcement hardware, I merely recommend carefully vetting all the software you permit to run on it, or to be generated by it. (So don’t worry, I’m not an advance spokesperson for the Superhappies.)
    
    This isn’t an absolute, just a VERY strong heuristic, in my experience. Sort of like, if someone’s going to commit suicide, I have more hoops for them to jump through to prove their rationality, than someone who’s just going to the grocery store. ;-)
    
    And, based on what you’ve said thus far, it doesn’t sound like you’ve thoroughly investigated what concrete (near-system) rules drove the creation of your aspiration to suffering.
    
    (As opposed to the abstract ideation that happened afterward, since a major function of abstract ideation is to allow us to hide our near-system rules from ourselves and others… an idea I got from OB, btw, and one that significantly increased the effectiveness of my work!)
    
    Now, were you advocating a positive justification for the use of unhappiness, rather than a desire to avoid its loss, I wouldn’t need to apply the same stringency of questioning… in the same way that I wouldn’t question a masochist finding enjoyment in the experience of pain!
    
    And if you were giving a detailed rationale for your negative justification, I’d be at least somewhat more satisfied. However, your justifications here and on OB sound to me like vague “apologies for death”, that is, they handwave various objections as being “obvious”, without providing any specific scenario in which any given person would actually be better off by not having the option of immortality, or by lacking the ability to reject unhappiness, or to get over it with arbitrary quickness.
    
    Also, you didn’t answer any of my questions like, “So, how long would you need to be unhappy, after some specific person died?” This kind of vagueness is (in my experience) an strong indicator of negatively-motivated rationalization. After all, if this were as well-thought out as your other positions, it seems to me that you’d either already have had an answer ready, or one would have come quickly to mind.
    
    That one question is particularly relevant, too, for determining where our positions actually differ—if they really do! I don’t mind being (briefly) unhappy, as an indicator that something is wrong. I just don’t see any point in leaving the alarm bell ringing, ²⁴⁄₇ thereafter. Our lives and concerns don’t exist on the same timescales as our ancestors, and a life-threatening problem 20 years from now, simply doesn’t merit the same type of stress response as one that’s going to happen 20 seconds from now. But our nervous systems don’t seem to know the difference, or at least lack the required dynamic range for an adequate degree of distinction.
    
    By the way, this comment gives a more detailed explanation of how the negative reinforcement mechanism leads to undesirable results besides excessive stress (like hypocrisy and inner conflict!) compared to keeping it mostly-inactive, within the region where positive reinforcement is equally suitable to create roughly-similar results.
    
    And now, I’m going to sign off for tonight, and take a break from writing here for a while. I need to get back to work on the writing and speaking I do for my paying customers, at least for a few days anyhow. ;-) But I nonetheless look forward to your response.
    - Emile 13 Mar 2009 14:06 UTC
      2 points
      Parent
      Interesting thread!
      
      I’m not sure that pjeby has fully adressed Eliezer’s concern that “eliminating my negative emotions would be changing my preferences, and changing my preferences so that they’re satisfied is against my current preferences (otherwise, I’d just go for being an orgasmium)”.
      
      (Well, at least that’s how I’d paraphrase it, Eliezer, tell me if I’m wrong)
      
      To which I would answer:
      
      Yes, it’s very possible that eliminating some negative emotions would be immoral, or at least, would change one’s preferences in a way my previous preferences would disagree with (think: eliminating the guilt over killing people, and things like that. I wouldn’t be very happy to learn that the army or police of a dictatorship is researching emotion elimination)
      
      Still, there is probably a wide range of negative feelings that could be removed in a way that doesn’t contradict one’s original preferences—in the sense that the pre-modification person wouldn’t find the behaviour of the modified person objectionable.
      
      The line between which changes are OK and which are not is not that obvious to draw, and many posts on OB talk about it (The difference between the morality of the ancient greek and our own, and thus the risk of “freezing” our own morality and barring future moral progress, the Confessor’s objections to non-consensual sex, etc.). pjeby might be being a bit light-handed when he dismisses concerns over changing preferences as “irrational”, but I think he meant that careful examination could show that those changes stayed in the second category and wouldn’t turn one into a immoral monster.
      
      (It feels a bit weird answering pjeby’s post in the third person, but it felt clearer to me that way :P I’m not responding to this post in particular)
      
      (Disclaimer: I’m one of pjeby’s clients, but that’s not why I’m here, I’ve been reading OvercomingBias since nearly the beginning)
      - pjeby 13 Mar 2009 15:35 UTC
        3 points
        Parent
        
        pjeby might be being a bit light-handed when he dismisses concerns over changing preferences as “irrational”
        
        I didn’t (explicitly) dismiss those concerns; I said that away-from reasoning has a higher rationality standard to meet, in part because it’s likely to be vague.
        
        I wasn’t even thinking about preference-changing being dangerous, because our preferences are largely independent and mostly don’t “auto-update” when we change one—there’s a LOT of redundancy. So if a specific change isn’t compatible with your overall morality, you’ll note the dissonance, and change your preferences again to tune things better.
        
        Science-fictional evidence of preference-changing is about as far off as science-fictional evidence of AI behavior… and for the same reasons. The built-in models our brain uses to understand minds and their preferences, are simpler than the models the brain uses to create a mind… and its preferences.
        
        Offtopic: Shortly after you posted this, it appears that someone undertook a massive vote-down campaign, systematically searching for every comment I’ve ever posted to LW, and voting it down by 1. I don’t know if, or how these events are correlated.
        
        But, if the person who undertook that campaign was trying to send me a message of some sort, they neglected to include any actionable information content. I only noticed because the karma number suddenly and dramatically changed when I clicked through from one page to another, reading this morning’s new comments.… and that sudden large drop was weird enough to make me investigate.
        
        Otherwise, I probably never would’ve been aware of their action, as an action, let alone as any sort of feedback! If you want to communicate something to someone, it’s probably best to be more explicit. Or, in the alternative, contribute a patch to the LW software to let you filter out posts by people you don’t like, or perhaps the entire subthreads they participate in.
        Emile 13 Mar 2009 16:34 UTC
        1 point
        Parent
        Well, it wasn’t me :)
        
        I wish this place worked like StackOverflow, where you can only downvote once you have 100 karma; that would probably reduce the background noise in the voting …
- Vladimir_Nesov 14 Mar 2009 23:10 UTC
  1 point
  Parent
  This is what I was talking about. Please do prepare the posts, it’ll help you to clarify your position to yourself. Let them lie as drafts for a while, then make a decision about whether to post them. Note that your statements are about the form of human preference computation, not about the utility that computes the “should” following from human preferences. Do you know the derivation of expected utility formula? You refer to a well-known finding that people avoid negative reward more than they seek positive reward.
  - pjeby 15 Mar 2009 2:23 UTC
    1 point
    Parent
    
    You refer to a well-known finding that people avoid negative reward more than they seek positive reward.
    
    Well, there is that too, of course, but actually the issues I’m talking about here are (somewhat) orthogonal. Negatively-motivated reasoning is less likely to be rational in large part because it’s more vague—it requires only that the source of negative motivation be dismissed or avoided, rather than a particular source of positive motivation be obtained. Even if negative and positive motivation held the same weight, this issue would still apply.
    
    The literature I was actually referring to (about the largely asynchronous and simultaneous operation of negative and positive motivation), I linked to in another comment here, after you accused me of making unorthodox and unsupported claims. In my posts, I expect to also make reference to at least one paper on “affective synchrony”, which is the degree to which our negative and positive motivation systems activate to the same degree at the same time.
    
    Note that your statements are about the form of human preference computation, not about the utility that computes the “should” following from human preferences.
    
    All I’m pointing out is that a rationalist that ignores the irrationality of the hardware on which their computations are being run, while expecting to get good answers out of it, isn’t being very rational.