I’m saying that if you’re going to be unhappy about anything—a position I do currently lean toward, albeit with strong reservations—then you should be unhappy about facts.
Sometimes the important facts of which you worry are counterfactual. Which, after all, is what happens when you decide, determining the real decision, based on its comparison to your model of its unreal alternative.
In order to be unhappy “about” a fact, the fact has to have some meaning… a meaning which can exist only in your map, not the territory, since the fact or its converse have to have some utility—and the territory doesn’t come with utility labels attached.
However, there’s another source of possible misunderstanding here: my mental model of the brain includes distinct systems for utility and disutility—what I usually refer to as the pain brain and gain brain. The gain brain governs approach to things you want, while the pain brain governs avoidance of things you don’t want.
In theory, you don’t need anything this complex—you could just have a single utility function to squeeze your futures with. But in practice, we have these systems for historical reasons: an animal works differently depending on whether it’s chasing something or being chased.
What we call “unhappiness” is not merely the absence of happiness, it’s the activation of the “pain-avoidance” system—a system that’s largely superfluous (given our now-greater reasoning capacity) unless you’re actually being chased by something.
So, from my perspective, it’s irrational to maintain any belief that has the effect of activating the the pain brain in situations that don’t require an urgent, “this is a real emergency” type of response. In all other kinds of situations, pain-brain responses are less useful because they are:
more emotional
more urgent and stressful
less deep thinking
less creativity and willingness to explore options
less risk-taking
And while these characteristics could potentially be life-saving in a truly urgent emergency… they are pretty much life-destroying in all other contexts.
So, while you might have a preference that people not be religious (for example), there is no need for this preference not being met, to cause you any actual unhappiness.
In other words, you can be happy about a condition X being met in reality, without also requiring that you be unhappy when condition X is not met.
Should I not be unhappy when people die? I know that I could, by altering my thought processes, make myself less unhappy; I know that this unhappiness is not cognitively unavoidable. I choose not to avoid it. The person I aspire to be has conditions for unhappiness and will be unhappy when those conditions are met.
Our society thinks that being unhappy is terribly, terribly sinful. I disagree morally, pragmatically, and furthermore think that this belief leads to a great deal of unhappiness.
I don’t know. Is it useful for you to be unhappy when people die? For how long? How will you know when you’ve been sufficiently unhappy? What bad thing will happen if you’re not unhappy when people die? What good thing happens if you are unhappy?
And I mean these questions specifically: not “what’s good about being unhappy in general?” or “what’s good about being unhappy when people die, from an evolutionary perspective?”, but why do YOU, specifically, think it’s a good thing for YOU to be unhappy when some one specific person dies?
My hypothesis: your examination will find that the idea of not being unhappy in this situation is itself provoking unhappiness. That is, you think you should be unhappy when someone dies, because the idea of not being unhappy will make you unhappy also.
The next question to ask will then be what, specifically, you expect to happen in response to that lack of unhappiness, that will cause you to be unhappy.
And at that point, you will discover something interesting: an assumption that you weren’t aware of before.
So, if you believe that your unhappiness should match the facts, it would be a good idea to find out what facts your map is based on, because “death ⇒ unhappiness” is not labeled on the territory.
Pjeby, I’m unhappy on certain conditions as a terminal value, not because I expect any particular future consequences from it. To say that it is encoded directly into my utility function (not just that certain things are bad, but that I should be a person who feels bad about them) might be oversimplifying in this case, since we are dealing with a structurally complicated aspect of morality. But just as I don’t think music is valuable without someone to listen to it, I don’t think I’m as valuable if I don’t feel bad about people dying.
If I knew a few other things, I think, I could build an AI that would simply act to prevent the death of sentient beings, without feeling the tiniest bit bad about it; but that AI wouldn’t be what I think a sentient citizen should be, and so I would try not to make that AI sentient.
It is not my future self who would be unhappy if all his unhappiness were eliminated; it is my current self who would be unhappy on learning that my nature and goals would thus be altered.
Did you read the Fun Theory sequence and the other posts I referred you to? I’m not sure if I’m repeating myself here.
Possibly relevent: A General Theory of Love suggests that love (imprinting?) includes needing the loved one to help regulate basic body systems. It starts with the observation that humans are the only species whose babies die from isolation.
I’ve read a moderate number of books by Buddhists, and as far as I can tell, while a practice of meditation makes ordinary problems less distressing, it doesn’t take the edge off of grief at all. It may even make grief sharper.
I’m unhappy on certain conditions as a terminal value, not because I expect any particular future consequences from it.
Really? How do you know that? What evidence would convince you that your brain is expecting particular future consequences, in order to generate the unhappiness?
I ask because my experience tells me that there are only a handful of “terminal” negative values, and they are human universals; as far as I can tell, it isn’t possible for a human being to create their own terminal (negative) values. Instead, they derive intermediate negative values, and then forget how they did the derivation… following which they invent rationalizations that sound a lot like the ones they use to explain why death is a good thing.
Don’t you find it interesting that you should defend this “terminal” value so strongly, without actually asking yourself the question, “What really would happen if I were not unhappy in situation X?” (Where situation X is actually specified to a level allowing sensory detail—not some generic abstraction.)
It’s clear from what you’ve written throughout this thread that the answer to that question is something like, “I would be a bad person.” And in my experience, when you then ask something like, “And how did I learn that that would make me bad?”, you’ll discover specific, emotional memories that provide the only real justification you had for thinking this thought in the first place… and that it has little or no connection to the rationalizations you’ve attached to it.
Really? How do you know that? What evidence would convince you that your brain is expecting particular future consequences, in order to generate the unhappiness?
You could actually tell me what I fear, and I’d recognize it when I heard it?
What would it take for me to convince you that I’m repulsed by the thing-as-it-is and not its future consequence?
I ask because my experience tells me that there are only a handful of “terminal” negative values
I strongly suspect, then, that you are too good at finding psychological explanations! Conditioned dislike is not the same as conditional dislike. We can train our terminal values, and we can be moved by arguments about them. Now, there may be a humanly universal collection of negative reinforcers, although there is not any reason to expect the collection to be small; but that is not the same thing as a humanly universal collection of terminal values.
I can tell you just exactly what would happen if I weren’t unhappy: I would live happily ever afterward. I just don’t find that to be the most appealing prospect I can imagine, though one could certainly do worse.
What would it take for me to convince you that I’m repulsed by the thing-as-it-is and not its future consequence?
A source listing for the relevant code and data structures in your brain. At the moment, the closest thing I know to that is examining formative experiences, because recontextualizing those experiences is the most rapid way to produce testable change in a human being.
We can train our terminal values, and we can be moved by arguments about them.
Then we mean different things by “terminal” in this context, since I’m referring here to what comes built-in to a human, versus what is learned by a human. How did you learn that you should have that particular terminal value?
I can tell you just exactly what would happen if I weren’t unhappy: I would live happily ever afterward.
As far as I can tell, that’s a “far” answer to a “near” question—it sounds like the result of processing symbols in response to an abstraction, rather than one that comes from observing the raw output of your brain in response to a concrete question.
In effect, my question is, what reinforcer shapes/shaped you to believe that it would be bad to live happily ever after?
(Btw, I don’t claim that happily-ever-after possible—I just claim that it’s possible and practical to reduce one’s unhappiness by pruning one’s negative values to those actually required to deal with urgent threats, rather than allowing them to be triggered by chronic conditions. I don’t even expect that I won’t grieve people important to me… but I also expect to get over it, as quickly as is practical for me to do so.)
Argh. You keep editing your comments after I’ve already started my replies. I guess I’ll need to wait longer before replying, in future.
Your detailed responses are off-point, though, except for “Serious Stories”, in which you suggest that it would be useful to get rid of unnecessary and soul-crushing pain and/or sorrow. My position is that a considerable portion of that unnecessary and soul-crushing stuff can be done away with, merely by rational examination of the emotional source of your beliefs in the relevant context.
Specifically, how do you know what “person you aspire to be”? My guess: you aspire to be that person, not because of an actual aspiration, but rather because you are repulsed by the alternative, and that the alternative is something you’re either afraid you are, or might easily become. (In other words, a 100% standard form of irrationality known as an “ideal-belief-reality conflict”.)
What’s more, when you examine how you came to believe that, you will find one or more specific emotional experiences… which, upon further consideration, you will find you gave too much weight to, due to their emotional content at the time.
Now, you might not be as eager to examine this set of beliefs as you were to squirt ice water in your ear, but I have a much higher confidence that the result will be more useful to you. ;-)
By “person I aspire to be” I mean that my present self has this property and my present self wants my future self to have this property. I originally wrote “person I define as me” but that seemed like too much of a copout.
Yes, I’m repulsed by imagining the alternative Eliezer who feels no pain when his friends, family, or a stranger in another country dies. It is not clear to me why you feel this is irrational. Nor is it based on any particular emotional experience of mine of having ever been a sociopath.
It seems to me that you are verging here on the failure mode of having psychoanalysis the way that some people have bad breath. If you don’t like my arguments, argue otherwise. Just casting strange hints of childhood trauma is… well, it’s having psychoanalysis the way some people have bad breath.
So far as I can tell, being a person who hurts when other people hurt is part of that which appears to me from the inside as shouldness.
My question is about the implementation of meta-ethics in the human brain. If I were going to write a program to simulate Eliezer Yudkowsky, what rules (other than “be unhappy when others are unhappy”) would I need to program in for you to arrive at this “obvious” conclusion?
In my personal experience, the morality that people arrive at by avoiding negative consequences is substantially different than the morality they arrive at by seeking positive ones.
In other words, a person who does good because they will otherwise be a bad person, is not the same as a person who does good because it brings good. Their actions and attitudes differ in substantive ways, besides the second person being happier. For example, the second person is far more likely to actually be generous and warm towards other people—especially living, present, individual people, rather than “people” as an abstraction.
So which of these two is really the “good” person, from your moral perspective?
(On another level, by the way, I fail to see how contagious, persistent unhappiness is a moral good, since it greatly magnifies the total amount of unhappiness in the universe. But that’s a separate issue from the implementation question.)
It seems to me that when you say ‘meta-ethics’ you simply mean ‘ethics’. I don’t know why you’d think meta-ethics would need to be implemented in the human brain. Ethics is in the world; meta-ethics doubly so. There’s a fact about what’s right, just like there’s a fact about what’s prime. You could ask why we care about what’s right, but that’s neither an ethical question nor a meta-ethical one. The ethical question is ‘what’s right?’ and the meta-ethical question is ‘what makes something a good answer to an ethical question?’. Both of those questions can be answered without reference to humans, though humans are the only reason why anyone would care.
Unless Eliezer has some supernatural entity to do his thinking for him, his ethics and meta-ethics require some physical implementation. Where else are you proposing that he store and process them, besides physical reality?
I think you’re shifting between ‘ethics’ and ‘what Eliezer thinks about ethics’. While it’s possible that ideas are not real save via some implementation, I don’t think it would therefore have to be in a particular human; systems know things too.
You seem to frequently shift the focus of conversation as it happens, hurting the potential for rational discourse in favor of making emotively positive statements that loosely correlate with the topic at hand. Would you be the same pjeby that writes those reprehensible self-help books?
That seemed a bit ad hominem. The commenter pjeby (I know nothing else about him) seems like someone who might be unfamiliar with part of the LW/OB background corpus but is reasoning pretty well under those conditions.
Actually, I’m quite familiar with a large segment of the OB corpus—it’s been highly influential on my work. However, I also see what appear to be a few holes or incoherencies within the OB corpus… some of which appear to stem from precisely the issue I’ve been asking you about in this thread. (i.e. the role of negative utilities in creating bias)
In my personal experience, negative utilities create bias because they cut off consideration of possibilities. This is useful in an emergency—but not much anywhere else. If human beings had platonically perfect minds, there would be no difference between a uniform utility scale and a dual positive/negative one… but as far as I can tell (and research strongly suggests) we do have two different systems.
So, although you’re wary of Robin’s “cynicism” and my “psychological explanations”, this is inconsistent with your own statements, such as:
There is no perfect argument that persuades the ideal philosopher of perfect emptiness to attach a perfectly abstract label of ‘good’. The notion of the perfectly abstract label is incoherent, which is why people chase it round and round in circles. What would distinguish a perfectly empty label of ‘good’ from a perfectly empty label of ‘bad’? How would you tell which was which?
See, I’m as puzzled by your ability to write something like that, and then turn around and argue an absolute utility for unhappiness, as you are puzzled by that Nobel-winning Bayesian dude who still believes in God. From my POV, it’s just as inconsistent.
There must be some psychology that creates your position, but if your position is “truly” valid (assuming there were such a thing), then the psychology wouldn’t matter. You should be able to destroy the position, and then reconstruct it from more basic principles, once the original influence is removed, no? (This idea is also part of the corpus.)
Are you familiar with Eliezer’s take on naturalistic meta-ethics in particular, or just with other large segments of the OB corpus? If the former, maybe you could take more care to spell out that you get the difference between “achieving one’s original goals” and “hacking one’s goal-system so that the goal-system thinks one has acheived one’s goals (e.g., by wireheading)”.
I like your writing, but in this particular thread, my impression is that you’re “rounding to the nearest cliche”—interpreting Eliezer and others as saying the nearest mistake that you’ve heard your students or others make, rather than making an effort to understand where people are coming from. My impression may be false, but it sounds like I’m not the only one who has it, and it’s distracting, so maybe take more care to spell out in visible terms a summary of peoples’ main points, so we know you’re disagreeing with what they’re saying and not with some other view.
More generally, you’ve joined a community that has been thinking awhile and has some unusual concepts. I’m glad you’ve joined the commenters, because we badly need the best techniques we can get for changing our own thinking habits and for teaching the same to others—we need techniques for learning and teaching rationality—and I find your website helpful here, and your actual thinking on the subject, in context, can probably become better still. But I wonder if you could maybe take a bit more care in general to hear the threads you’re responding to. I’ve felt like you were “rounding to the nearest cliche” in your thread with me as well (I wasn’t going off the Lisa Simpson happiness theory), and it might be nice if you could take the stance of a co-participant in the conversation, who is interested in both learning and teaching, instead of repeating the (good) points on your website in response to all comments, whatever the comments’ subject matter.
First, yes, I do understand the the difference between goal-achievement and wireheading. I’m drawing a much finer distinction about the means by which you set up a system to achieve your goals, as well as the means by which you choose those goals in the first place.
It is possible in some cases that I’ve “rounded to the nearest cliche” as you put it. But I’m pretty confident that I’m not doing that with Eliezer’s points, precisely because I’ve read so much of his work… but also because the mistake I believe he is making (or at least, the thing he appears to not be noticing) is a perfect example of a point that I was trying to make in another thread… about why you can’t just put one new, correct belief in someone’s head, and have it magically fix every broken belief they already have.
I’m a little confused about the rest of your statement; it doesn’t seem to me that I’m repeating the same points, so much as that I’ve been struggling to deal with the fact that so many of the threads I’ve become involved in, boil down (AFAICT) to the same issues—and trying NOT to have too much duplication in my responses, while also not wanting to create a bunch of inter-comment links. (Another fine example of how avoiding negatives leads to bad decisions… ;-) )
Now, whether that’s a case of me having only a hammer, or whether it’s simply because everything really is made out of ones and zeros, I’m not sure. It has been seeming to me for a bit now, that what I really need to do is write an LW article about positive/negative utility and abstract/concrete thinking, as these are the main concepts I work with that clash with some portions of the OB corpus (and some of the more vocal LW commenters). Putting that stuff in one place would certainly help reduce duplication.
Meanwhile, it’s not my intention to reduce anyone to cliche, or to presume that I understand something I don’t. If I were, I wouldn’t spend so much time in so many of my comments, asking so many questions. They are not rhetorical; they represent genuine curiosity. And I’ve actually learned quite a lot from the process of asking and commenting in the last few days; many things I’ve written here are NOT concepts I previously had.
This is especially true for the two comments that were replies to you; they were my musings on the ideas I got from your statements, more than critique or commentary of anything you said. I can see how that might make you feel not understood, however. (Also, the “Lisa Simpson theory” part of that one comment was actually directed to the comment you were replying to, not your comment in that thread, btw. I was trying to avoid writing two replies there.)
Thanks for the thoughtful reply. It’s quite possible I misinterpreted. Also, re: the Lisa Simpson thing, I’ll be more careful to look at other nearby posts people might be replying to instead of reading comments so much from the new comments page.
It seems slightly odd to me that you say you’re “pretty confident” you’re not rounding Eliezer’s point to the nearest cliche in part because the mistake you think he’s making “is a perfect example of a point [you] were trying to make in another thread”. Isn’t that what it feels like when one rounds someone’s response to a pre-existing image of “oh, the such-and-such mistake”?
A LW article about how people think about positive/negative utility, and another about abstract/concrete thinking, sounds wonderful. Then we can sift through your concepts as a community, air confusions or objections in a coherent manner, etc.; and you can reference it and it’ll be part of our shared corpus. Both topics sound useful.
Isn’t that what it feels like when one rounds someone’s response to a pre-existing image of “oh, the such-and-such mistake”?
So, how would you distinguish that, from the case where their response is making the such-and-such mistake?
The way I’d distinguish it, is to ask questions that would have different answers, depending on whether the person is making that mistake or not. I asked Eliezer those questions, and of the ones he answered, the answers were consistent with my model of the mistake.
Of course, there’s always the possibility of confirmation bias… except that I also know what answers I’d have taken as disconfirming my hypothesis, which makes it at least a little less likely. (But I do know of more than one mechanism by which beliefs and behaviors are formed and maintained, and it would’ve been plausible—albeit less probable—that his evaluation could’ve been formed another way. And I’d have been perfectly okay with my hypothesis being wrong.)
See, I’m not pointing out what I believe to be a mistake because I think I’m smarter than Eliezer… it’s because I’m constantly making the same mistake. We all do, because it’s utterly trivial to make it, and really non-trivial to spot it. And if you haven’t gotten an intuitive grasp of why and how that mistake comes into being (for example, if you insist it doesn’t exist in the first place!), then it’s hard to see why there’s “no silver bullet” for reducing the complexity of developing “rationality” in people.
So, how would you distinguish that, from the case where their response is making the such-and-such mistake?
If my interlocutor is someone who might well have thoughts that don’t fit into my schemas, I might be suspicious enough of my impression that they were making one of my standard cached example-mistakes that I’d:
Make a serious effort at original seeing, and make sure my model of the such-and-such mistake is really the best way to organically understand the situation in front of me; and then
Describe my schema for the such-and-such mistake (in general), and see if the person agrees that such-and-such is a mistake; and then
Describe the instance of the such-and-such mistake that the person seems to be making, and ask if they agree or if there’s a kind of reasoning going into their claim that doesn’t fit into my schema.
Or maybe this is just the pain-in-the-neck method one should use if one’s original communication attempt stalls somewhere. Truth be told, I’m at this point rather confused about which aspects of meta-ethics under dispute, and I can’t easily scan back through the conversation to find the quotes of yours that made me think you misunderstood because our conversation has overflowed LW’s conversation-display settings. And you’ve made some good points, and I’m convinced I misunderstood you in at least some cases. I’m going to bow out of this conversation for now and wait to discuss values and their origins properly, in response to your own post. (Or if you’d rather, I’d love to discuss by email; I’m annasalamon at gmail.)
Yes, the comment system here is really not suited to the kind of conversation I’ve been trying to have… not that I’m sure what system would work for it. ;-)
As far as meta-ethics goes, the short summary is:
“Avoiding badness” and “seeking goodness” are not interchangeable when you experience them concretely on human hardware,
It is therefore a reasoning error to treat them as if they were interchangeable in your abstract moral calculations (as they will not work the same way in practice),
Due to the specific nature of the human hardware biases involved (i.e., the respective emotional, chemical, and neurological responses to pain vs. pleasure) , badness-avoidance values are highly likely to be found irrational upon detailed examination… and thus they are always the ones worth examining first.
Badness-avoidance values are a disproportionately high (if not exclusive!) source of “motivated reasoning”. i.e., we don’t so much rationalize to paint pretty pictures, as to hide the ugly ones. (Which makes rooting them out of critical importance to rationalists.)
This summary is more to clarify my thoughts for the eventual post, than an attempt to continue the discussion. (To me, these things are so obvious and so much a part of my day-to-day experience that I often forget the inferential distance involved for most people.)
These ideas are all capable of experimental verification; the first one has certainly been written about in the literature. None are particularly unorthodox or controversial in and of themselves, as far as I’m aware.
However, there are common arguments against some of these ideas that my own students bring up, so in my (eventual) post I’ll need to bring them up and refute them as well.
For example, a common argument against positively-motivated goodness is that feeling good about being generous means you’re “really” being selfish… and thus bad! So, the person advancing this argument is motivated to rationalize the “virtue” of being dutiful—i.e., doing something you don’t want to, but nonetheless “should”—because it would be bad not to.
Strangely, most people have these judgments only in relation to their self… They see no problem with someone else doing good out of generosity or kindness, with no pain or duty involved. It’s only themselves they sentence to this “virtue” of suffering to achieve goodness. (Which is sort of like “fighting for peace” or “f*ing for virginity”, but I digress.)
Whether this is something inbuilt, cultural, or selection bias of people I work with, I have no idea. But it’s damn common… and Eliezer’s making a virtue out of unhappiness (beyond the bare minimums demanded by safety, etc.) fits smack dab in the middle of this territory.
Whew. Okay, I’m going to stop writing this now… this really needs to be a post. Or several. The more I think about how to get here, starting from only the OB corpus and without recapitulating my own, the bigger I realize the inferential gap is.
You may be running into the Reversed Stupidity problem; most cases you’ve seen advocating negative feelings are stupid, therefore, you assume that all such advocations must result from the same stupidity.
I sympathize because I remember back when I would have thought that anyone arguing against the abolitionist program—that is, the total abolition of all suffering—was a Luddite.
But I eventually realized I didn’t want to eliminate my negative reinforcement hardware, and that moreover, I wouldn’t be such a bad person if I, you know, just did things the way I did want, instead of doing things the way I felt vaguely dutifully obligated to want but didn’t want.
Why am I a terrible, bad person for not wanting to modify myself in that way? What higher imperative should override: “I’d rather not do this”?
I didn’t say you’re a terrible bad person—I said your choice to be unhappy in the absence of a positive benefit from same, is likely to be found irrational, if you reflect on the concrete emotional reason you find the prospect abhorrent.
I also don’t recommend eliminating the negative reinforcement hardware, I merely recommend carefully vetting all the software you permit to run on it, or to be generated by it. (So don’t worry, I’m not an advance spokesperson for the Superhappies.)
This isn’t an absolute, just a VERY strong heuristic, in my experience. Sort of like, if someone’s going to commit suicide, I have more hoops for them to jump through to prove their rationality, than someone who’s just going to the grocery store. ;-)
And, based on what you’ve said thus far, it doesn’t sound like you’ve thoroughly investigated what concrete (near-system) rules drove the creation of your aspiration to suffering.
(As opposed to the abstract ideation that happened afterward, since a major function of abstract ideation is to allow us to hide our near-system rules from ourselves and others… an idea I got from OB, btw, and one that significantly increased the effectiveness of my work!)
Now, were you advocating a positive justification for the use of unhappiness, rather than a desire to avoid its loss, I wouldn’t need to apply the same stringency of questioning… in the same way that I wouldn’t question a masochist finding enjoyment in the experience of pain!
And if you were giving a detailed rationale for your negative justification, I’d be at least somewhat more satisfied. However, your justifications here and on OB sound to me like vague “apologies for death”, that is, they handwave various objections as being “obvious”, without providing any specific scenario in which any given person would actually be better off by not having the option of immortality, or by lacking the ability to reject unhappiness, or to get over it with arbitrary quickness.
Also, you didn’t answer any of my questions like, “So, how long would you need to be unhappy, after some specific person died?” This kind of vagueness is (in my experience) an strong indicator of negatively-motivated rationalization. After all, if this were as well-thought out as your other positions, it seems to me that you’d either already have had an answer ready, or one would have come quickly to mind.
That one question is particularly relevant, too, for determining where our positions actually differ—if they really do! I don’t mind being (briefly) unhappy, as an indicator that something is wrong. I just don’t see any point in leaving the alarm bell ringing, 24⁄7 thereafter. Our lives and concerns don’t exist on the same timescales as our ancestors, and a life-threatening problem 20 years from now, simply doesn’t merit the same type of stress response as one that’s going to happen 20 seconds from now. But our nervous systems don’t seem to know the difference, or at least lack the required dynamic range for an adequate degree of distinction.
By the way, this comment gives a more detailed explanation of how the negative reinforcement mechanism leads to undesirable results besides excessive stress (like hypocrisy and inner conflict!) compared to keeping it mostly-inactive, within the region where positive reinforcement is equally suitable to create roughly-similar results.
And now, I’m going to sign off for tonight, and take a break from writing here for a while. I need to get back to work on the writing and speaking I do for my paying customers, at least for a few days anyhow. ;-) But I nonetheless look forward to your response.
I’m not sure that pjeby has fully adressed Eliezer’s concern that “eliminating my negative emotions would be changing my preferences, and changing my preferences so that they’re satisfied is against my current preferences (otherwise, I’d just go for being an orgasmium)”.
(Well, at least that’s how I’d paraphrase it, Eliezer, tell me if I’m wrong)
To which I would answer:
Yes, it’s very possible that eliminating some negative emotions would be immoral, or at least, would change one’s preferences in a way my previous preferences would disagree with (think: eliminating the guilt over killing people, and things like that. I wouldn’t be very happy to learn that the army or police of a dictatorship is researching emotion elimination)
Still, there is probably a wide range of negative feelings that could be removed in a way that doesn’t contradict one’s original preferences—in the sense that the pre-modification person wouldn’t find the behaviour of the modified person objectionable.
The line between which changes are OK and which are not is not that obvious to draw, and many posts on OB talk about it (The difference between the morality of the ancient greek and our own, and thus the risk of “freezing” our own morality and barring future moral progress, the Confessor’s objections to non-consensual sex, etc.). pjeby might be being a bit light-handed when he dismisses concerns over changing preferences as “irrational”, but I think he meant that careful examination could show that those changes stayed in the second category and wouldn’t turn one into a immoral monster.
(It feels a bit weird answering pjeby’s post in the third person, but it felt clearer to me that way :P I’m not responding to this post in particular)
(Disclaimer: I’m one of pjeby’s clients, but that’s not why I’m here, I’ve been reading OvercomingBias since nearly the beginning)
pjeby might be being a bit light-handed when he dismisses concerns over changing preferences as “irrational”
I didn’t (explicitly) dismiss those concerns; I said that away-from reasoning has a higher rationality standard to meet, in part because it’s likely to be vague.
I wasn’t even thinking about preference-changing being dangerous, because our preferences are largely independent and mostly don’t “auto-update” when we change one—there’s a LOT of redundancy. So if a specific change isn’t compatible with your overall morality, you’ll note the dissonance, and change your preferences again to tune things better.
Science-fictional evidence of preference-changing is about as far off as science-fictional evidence of AI behavior… and for the same reasons. The built-in models our brain uses to understand minds and their preferences, are simpler than the models the brain uses to create a mind… and its preferences.
Offtopic: Shortly after you posted this, it appears that someone undertook a massive vote-down campaign, systematically searching for every comment I’ve ever posted to LW, and voting it down by 1. I don’t know if, or how these events are correlated.
But, if the person who undertook that campaign was trying to send me a message of some sort, they neglected to include any actionable information content. I only noticed because the karma number suddenly and dramatically changed when I clicked through from one page to another, reading this morning’s new comments.… and that sudden large drop was weird enough to make me investigate.
Otherwise, I probably never would’ve been aware of their action, as an action, let alone as any sort of feedback! If you want to communicate something to someone, it’s probably best to be more explicit. Or, in the alternative, contribute a patch to the LW software to let you filter out posts by people you don’t like, or perhaps the entire subthreads they participate in.
I wish this place worked like StackOverflow, where you can only downvote once you have 100 karma; that would probably reduce the background noise in the voting …
This is what I was talking about. Please do prepare the posts, it’ll help you to clarify your position to yourself. Let them lie as drafts for a while, then make a decision about whether to post them. Note that your statements are about the form of human preference computation, not about the utility that computes the “should” following from human preferences. Do you know the derivation of expected utility formula? You refer to a well-known finding that people avoid negative reward more than they seek positive reward.
You refer to a well-known finding that people avoid negative reward more than they seek positive reward.
Well, there is that too, of course, but actually the issues I’m talking about here are (somewhat) orthogonal. Negatively-motivated reasoning is less likely to be rational in large part because it’s more vague—it requires only that the source of negative motivation be dismissed or avoided, rather than a particular source of positive motivation be obtained. Even if negative and positive motivation held the same weight, this issue would still apply.
The literature I was actually referring to (about the largely asynchronous and simultaneous operation of negative and positive motivation), I linked to in another comment here, after you accused me of making unorthodox and unsupported claims. In my posts, I expect to also make reference to at least one paper on “affective synchrony”, which is the degree to which our negative and positive motivation systems activate to the same degree at the same time.
Note that your statements are about the form of human preference computation, not about the utility that computes the “should” following from human preferences.
All I’m pointing out is that a rationalist that ignores the irrationality of the hardware on which their computations are being run, while expecting to get good answers out of it, isn’t being very rational.
It was deliberately ad hominem, of course—just not the fallacious kind. We seriously need profile pages of some sort. Wish I had the stomach for Python.
I don’t expect anyone to be familiar with the LW/OB background corpus—I expect my education and training is quite different from yours, for example. However, I still expect one to follow rules of conduct with respect to reasonable discourse, for example avoiding equivocation and its related vices.
Or maybe I’m just viscerally angered by the winky smileys. Who knows.
I don’t see how I can separate “ethics” from “what Eliezer thinks about ethics” and still have a meaningful conversation with him on the topic.
Meanwhile, reading back through the thread, the only digressions I see in my comments are those made in response to those raised by you or Eliezer. Perhaps you could point to some specific examples of these shifted foci and emotively positive statements? I do not see them.
As for my “reprehensible” books, I trust you formed that judgment by actually reading them, yes? If so, then yes, I’m that person. But if you didn’t read them, then clearly your judgment isn’t about the books I actually wrote… and thus, I could not have been the person who wrote the (imaginary) ones you’d therefore be talking about. ;-)
Perhaps you could point to some specific examples of these shifted foci and emotively positive statements? I do not see them.
I was not referring only to this thread, but to several ongoing discussions. If you’d like clear examples, feel free to contact me via http://thomblake.com or http://thomblake.mp
As Eliezer has kindof pointed out, I’m weary enough from this discussion to be on the verge of irrationality, so I shall retire from it (if only because this forum is devoted to rationality!).
I agree with your reasoning, but I think there are plenty of reasons to be unhappy about religion that go beyond the absence of a preferred state.
In other words, I think I should be actively displeased that religion exists and is prevalent, not merely being non-happy. Neutrality is included in non-happiness, and if the word were used logically, unhappiness. But the way it’s actually used, ‘unhappy’ means active displeasure.
How is this active displeasure useful to you? Does it cause you to do something different than if you merely prefer religion to not be present? What, specifically?
I’m saying that if you’re going to be unhappy about anything—a position I do currently lean toward, albeit with strong reservations—then you should be unhappy about facts.
Sometimes the important facts of which you worry are counterfactual. Which, after all, is what happens when you decide, determining the real decision, based on its comparison to your model of its unreal alternative.
In order to be unhappy “about” a fact, the fact has to have some meaning… a meaning which can exist only in your map, not the territory, since the fact or its converse have to have some utility—and the territory doesn’t come with utility labels attached.
However, there’s another source of possible misunderstanding here: my mental model of the brain includes distinct systems for utility and disutility—what I usually refer to as the pain brain and gain brain. The gain brain governs approach to things you want, while the pain brain governs avoidance of things you don’t want.
In theory, you don’t need anything this complex—you could just have a single utility function to squeeze your futures with. But in practice, we have these systems for historical reasons: an animal works differently depending on whether it’s chasing something or being chased.
What we call “unhappiness” is not merely the absence of happiness, it’s the activation of the “pain-avoidance” system—a system that’s largely superfluous (given our now-greater reasoning capacity) unless you’re actually being chased by something.
So, from my perspective, it’s irrational to maintain any belief that has the effect of activating the the pain brain in situations that don’t require an urgent, “this is a real emergency” type of response. In all other kinds of situations, pain-brain responses are less useful because they are:
more emotional
more urgent and stressful
less deep thinking
less creativity and willingness to explore options
less risk-taking
And while these characteristics could potentially be life-saving in a truly urgent emergency… they are pretty much life-destroying in all other contexts.
So, while you might have a preference that people not be religious (for example), there is no need for this preference not being met, to cause you any actual unhappiness.
In other words, you can be happy about a condition X being met in reality, without also requiring that you be unhappy when condition X is not met.
Should I not be unhappy when people die? I know that I could, by altering my thought processes, make myself less unhappy; I know that this unhappiness is not cognitively unavoidable. I choose not to avoid it. The person I aspire to be has conditions for unhappiness and will be unhappy when those conditions are met.
Our society thinks that being unhappy is terribly, terribly sinful. I disagree morally, pragmatically, and furthermore think that this belief leads to a great deal of unhappiness.
(My detailed responses being given in Feeling Rational, Not For the Sake of Happiness Alone, and Serious Stories, and furthermore illustrated in Three Worlds Collide.)
I don’t know. Is it useful for you to be unhappy when people die? For how long? How will you know when you’ve been sufficiently unhappy? What bad thing will happen if you’re not unhappy when people die? What good thing happens if you are unhappy?
And I mean these questions specifically: not “what’s good about being unhappy in general?” or “what’s good about being unhappy when people die, from an evolutionary perspective?”, but why do YOU, specifically, think it’s a good thing for YOU to be unhappy when some one specific person dies?
My hypothesis: your examination will find that the idea of not being unhappy in this situation is itself provoking unhappiness. That is, you think you should be unhappy when someone dies, because the idea of not being unhappy will make you unhappy also.
The next question to ask will then be what, specifically, you expect to happen in response to that lack of unhappiness, that will cause you to be unhappy.
And at that point, you will discover something interesting: an assumption that you weren’t aware of before.
So, if you believe that your unhappiness should match the facts, it would be a good idea to find out what facts your map is based on, because “death ⇒ unhappiness” is not labeled on the territory.
Pjeby, I’m unhappy on certain conditions as a terminal value, not because I expect any particular future consequences from it. To say that it is encoded directly into my utility function (not just that certain things are bad, but that I should be a person who feels bad about them) might be oversimplifying in this case, since we are dealing with a structurally complicated aspect of morality. But just as I don’t think music is valuable without someone to listen to it, I don’t think I’m as valuable if I don’t feel bad about people dying.
If I knew a few other things, I think, I could build an AI that would simply act to prevent the death of sentient beings, without feeling the tiniest bit bad about it; but that AI wouldn’t be what I think a sentient citizen should be, and so I would try not to make that AI sentient.
It is not my future self who would be unhappy if all his unhappiness were eliminated; it is my current self who would be unhappy on learning that my nature and goals would thus be altered.
Did you read the Fun Theory sequence and the other posts I referred you to? I’m not sure if I’m repeating myself here.
Possibly relevent: A General Theory of Love suggests that love (imprinting?) includes needing the loved one to help regulate basic body systems. It starts with the observation that humans are the only species whose babies die from isolation.
I’ve read a moderate number of books by Buddhists, and as far as I can tell, while a practice of meditation makes ordinary problems less distressing, it doesn’t take the edge off of grief at all. It may even make grief sharper.
Really? How do you know that? What evidence would convince you that your brain is expecting particular future consequences, in order to generate the unhappiness?
I ask because my experience tells me that there are only a handful of “terminal” negative values, and they are human universals; as far as I can tell, it isn’t possible for a human being to create their own terminal (negative) values. Instead, they derive intermediate negative values, and then forget how they did the derivation… following which they invent rationalizations that sound a lot like the ones they use to explain why death is a good thing.
Don’t you find it interesting that you should defend this “terminal” value so strongly, without actually asking yourself the question, “What really would happen if I were not unhappy in situation X?” (Where situation X is actually specified to a level allowing sensory detail—not some generic abstraction.)
It’s clear from what you’ve written throughout this thread that the answer to that question is something like, “I would be a bad person.” And in my experience, when you then ask something like, “And how did I learn that that would make me bad?”, you’ll discover specific, emotional memories that provide the only real justification you had for thinking this thought in the first place… and that it has little or no connection to the rationalizations you’ve attached to it.
You could actually tell me what I fear, and I’d recognize it when I heard it?
What would it take for me to convince you that I’m repulsed by the thing-as-it-is and not its future consequence?
I strongly suspect, then, that you are too good at finding psychological explanations! Conditioned dislike is not the same as conditional dislike. We can train our terminal values, and we can be moved by arguments about them. Now, there may be a humanly universal collection of negative reinforcers, although there is not any reason to expect the collection to be small; but that is not the same thing as a humanly universal collection of terminal values.
I can tell you just exactly what would happen if I weren’t unhappy: I would live happily ever afterward. I just don’t find that to be the most appealing prospect I can imagine, though one could certainly do worse.
A source listing for the relevant code and data structures in your brain. At the moment, the closest thing I know to that is examining formative experiences, because recontextualizing those experiences is the most rapid way to produce testable change in a human being.
Then we mean different things by “terminal” in this context, since I’m referring here to what comes built-in to a human, versus what is learned by a human. How did you learn that you should have that particular terminal value?
As far as I can tell, that’s a “far” answer to a “near” question—it sounds like the result of processing symbols in response to an abstraction, rather than one that comes from observing the raw output of your brain in response to a concrete question.
In effect, my question is, what reinforcer shapes/shaped you to believe that it would be bad to live happily ever after?
(Btw, I don’t claim that happily-ever-after possible—I just claim that it’s possible and practical to reduce one’s unhappiness by pruning one’s negative values to those actually required to deal with urgent threats, rather than allowing them to be triggered by chronic conditions. I don’t even expect that I won’t grieve people important to me… but I also expect to get over it, as quickly as is practical for me to do so.)
Argh. You keep editing your comments after I’ve already started my replies. I guess I’ll need to wait longer before replying, in future.
Your detailed responses are off-point, though, except for “Serious Stories”, in which you suggest that it would be useful to get rid of unnecessary and soul-crushing pain and/or sorrow. My position is that a considerable portion of that unnecessary and soul-crushing stuff can be done away with, merely by rational examination of the emotional source of your beliefs in the relevant context.
Specifically, how do you know what “person you aspire to be”? My guess: you aspire to be that person, not because of an actual aspiration, but rather because you are repulsed by the alternative, and that the alternative is something you’re either afraid you are, or might easily become. (In other words, a 100% standard form of irrationality known as an “ideal-belief-reality conflict”.)
What’s more, when you examine how you came to believe that, you will find one or more specific emotional experiences… which, upon further consideration, you will find you gave too much weight to, due to their emotional content at the time.
Now, you might not be as eager to examine this set of beliefs as you were to squirt ice water in your ear, but I have a much higher confidence that the result will be more useful to you. ;-)
By “person I aspire to be” I mean that my present self has this property and my present self wants my future self to have this property. I originally wrote “person I define as me” but that seemed like too much of a copout.
Yes, I’m repulsed by imagining the alternative Eliezer who feels no pain when his friends, family, or a stranger in another country dies. It is not clear to me why you feel this is irrational. Nor is it based on any particular emotional experience of mine of having ever been a sociopath.
It seems to me that you are verging here on the failure mode of having psychoanalysis the way that some people have bad breath. If you don’t like my arguments, argue otherwise. Just casting strange hints of childhood trauma is… well, it’s having psychoanalysis the way some people have bad breath.
So far as I can tell, being a person who hurts when other people hurt is part of that which appears to me from the inside as shouldness.
Okay, let me rephrase. Why is it better to be a person who hurts when other people hurt, than a person who is happier when people don’t hurt?
While EY might not put it this way, this line:
answered your question
since Eliezer was making a moral observation. The answer: It is obviously so. Do you have conflicting observational data?
How is it rational to treat a “moral observation” as “obviously so”? That’s how religion works, isn’t it?
This discussion is now about
my view on which is summarized in Joy in the Merely Good.
My question is about the implementation of meta-ethics in the human brain. If I were going to write a program to simulate Eliezer Yudkowsky, what rules (other than “be unhappy when others are unhappy”) would I need to program in for you to arrive at this “obvious” conclusion?
In my personal experience, the morality that people arrive at by avoiding negative consequences is substantially different than the morality they arrive at by seeking positive ones.
In other words, a person who does good because they will otherwise be a bad person, is not the same as a person who does good because it brings good. Their actions and attitudes differ in substantive ways, besides the second person being happier. For example, the second person is far more likely to actually be generous and warm towards other people—especially living, present, individual people, rather than “people” as an abstraction.
So which of these two is really the “good” person, from your moral perspective?
(On another level, by the way, I fail to see how contagious, persistent unhappiness is a moral good, since it greatly magnifies the total amount of unhappiness in the universe. But that’s a separate issue from the implementation question.)
It seems to me that when you say ‘meta-ethics’ you simply mean ‘ethics’. I don’t know why you’d think meta-ethics would need to be implemented in the human brain. Ethics is in the world; meta-ethics doubly so. There’s a fact about what’s right, just like there’s a fact about what’s prime. You could ask why we care about what’s right, but that’s neither an ethical question nor a meta-ethical one. The ethical question is ‘what’s right?’ and the meta-ethical question is ‘what makes something a good answer to an ethical question?’. Both of those questions can be answered without reference to humans, though humans are the only reason why anyone would care.
Unless Eliezer has some supernatural entity to do his thinking for him, his ethics and meta-ethics require some physical implementation. Where else are you proposing that he store and process them, besides physical reality?
I think you’re shifting between ‘ethics’ and ‘what Eliezer thinks about ethics’. While it’s possible that ideas are not real save via some implementation, I don’t think it would therefore have to be in a particular human; systems know things too.
You seem to frequently shift the focus of conversation as it happens, hurting the potential for rational discourse in favor of making emotively positive statements that loosely correlate with the topic at hand. Would you be the same pjeby that writes those reprehensible self-help books?
That seemed a bit ad hominem. The commenter pjeby (I know nothing else about him) seems like someone who might be unfamiliar with part of the LW/OB background corpus but is reasoning pretty well under those conditions.
Actually, I’m quite familiar with a large segment of the OB corpus—it’s been highly influential on my work. However, I also see what appear to be a few holes or incoherencies within the OB corpus… some of which appear to stem from precisely the issue I’ve been asking you about in this thread. (i.e. the role of negative utilities in creating bias)
In my personal experience, negative utilities create bias because they cut off consideration of possibilities. This is useful in an emergency—but not much anywhere else. If human beings had platonically perfect minds, there would be no difference between a uniform utility scale and a dual positive/negative one… but as far as I can tell (and research strongly suggests) we do have two different systems.
So, although you’re wary of Robin’s “cynicism” and my “psychological explanations”, this is inconsistent with your own statements, such as:
See, I’m as puzzled by your ability to write something like that, and then turn around and argue an absolute utility for unhappiness, as you are puzzled by that Nobel-winning Bayesian dude who still believes in God. From my POV, it’s just as inconsistent.
There must be some psychology that creates your position, but if your position is “truly” valid (assuming there were such a thing), then the psychology wouldn’t matter. You should be able to destroy the position, and then reconstruct it from more basic principles, once the original influence is removed, no? (This idea is also part of the corpus.)
pjeby,
Are you familiar with Eliezer’s take on naturalistic meta-ethics in particular, or just with other large segments of the OB corpus? If the former, maybe you could take more care to spell out that you get the difference between “achieving one’s original goals” and “hacking one’s goal-system so that the goal-system thinks one has acheived one’s goals (e.g., by wireheading)”.
I like your writing, but in this particular thread, my impression is that you’re “rounding to the nearest cliche”—interpreting Eliezer and others as saying the nearest mistake that you’ve heard your students or others make, rather than making an effort to understand where people are coming from. My impression may be false, but it sounds like I’m not the only one who has it, and it’s distracting, so maybe take more care to spell out in visible terms a summary of peoples’ main points, so we know you’re disagreeing with what they’re saying and not with some other view.
More generally, you’ve joined a community that has been thinking awhile and has some unusual concepts. I’m glad you’ve joined the commenters, because we badly need the best techniques we can get for changing our own thinking habits and for teaching the same to others—we need techniques for learning and teaching rationality—and I find your website helpful here, and your actual thinking on the subject, in context, can probably become better still. But I wonder if you could maybe take a bit more care in general to hear the threads you’re responding to. I’ve felt like you were “rounding to the nearest cliche” in your thread with me as well (I wasn’t going off the Lisa Simpson happiness theory), and it might be nice if you could take the stance of a co-participant in the conversation, who is interested in both learning and teaching, instead of repeating the (good) points on your website in response to all comments, whatever the comments’ subject matter.
First, yes, I do understand the the difference between goal-achievement and wireheading. I’m drawing a much finer distinction about the means by which you set up a system to achieve your goals, as well as the means by which you choose those goals in the first place.
It is possible in some cases that I’ve “rounded to the nearest cliche” as you put it. But I’m pretty confident that I’m not doing that with Eliezer’s points, precisely because I’ve read so much of his work… but also because the mistake I believe he is making (or at least, the thing he appears to not be noticing) is a perfect example of a point that I was trying to make in another thread… about why you can’t just put one new, correct belief in someone’s head, and have it magically fix every broken belief they already have.
I’m a little confused about the rest of your statement; it doesn’t seem to me that I’m repeating the same points, so much as that I’ve been struggling to deal with the fact that so many of the threads I’ve become involved in, boil down (AFAICT) to the same issues—and trying NOT to have too much duplication in my responses, while also not wanting to create a bunch of inter-comment links. (Another fine example of how avoiding negatives leads to bad decisions… ;-) )
Now, whether that’s a case of me having only a hammer, or whether it’s simply because everything really is made out of ones and zeros, I’m not sure. It has been seeming to me for a bit now, that what I really need to do is write an LW article about positive/negative utility and abstract/concrete thinking, as these are the main concepts I work with that clash with some portions of the OB corpus (and some of the more vocal LW commenters). Putting that stuff in one place would certainly help reduce duplication.
Meanwhile, it’s not my intention to reduce anyone to cliche, or to presume that I understand something I don’t. If I were, I wouldn’t spend so much time in so many of my comments, asking so many questions. They are not rhetorical; they represent genuine curiosity. And I’ve actually learned quite a lot from the process of asking and commenting in the last few days; many things I’ve written here are NOT concepts I previously had.
This is especially true for the two comments that were replies to you; they were my musings on the ideas I got from your statements, more than critique or commentary of anything you said. I can see how that might make you feel not understood, however. (Also, the “Lisa Simpson theory” part of that one comment was actually directed to the comment you were replying to, not your comment in that thread, btw. I was trying to avoid writing two replies there.)
I also get the sense that you’re trying to say something off-the-cuff in your replies that would be better done as a specific LW post.
Thanks for the thoughtful reply. It’s quite possible I misinterpreted. Also, re: the Lisa Simpson thing, I’ll be more careful to look at other nearby posts people might be replying to instead of reading comments so much from the new comments page.
It seems slightly odd to me that you say you’re “pretty confident” you’re not rounding Eliezer’s point to the nearest cliche in part because the mistake you think he’s making “is a perfect example of a point [you] were trying to make in another thread”. Isn’t that what it feels like when one rounds someone’s response to a pre-existing image of “oh, the such-and-such mistake”?
A LW article about how people think about positive/negative utility, and another about abstract/concrete thinking, sounds wonderful. Then we can sift through your concepts as a community, air confusions or objections in a coherent manner, etc.; and you can reference it and it’ll be part of our shared corpus. Both topics sound useful.
So, how would you distinguish that, from the case where their response is making the such-and-such mistake?
The way I’d distinguish it, is to ask questions that would have different answers, depending on whether the person is making that mistake or not. I asked Eliezer those questions, and of the ones he answered, the answers were consistent with my model of the mistake.
Of course, there’s always the possibility of confirmation bias… except that I also know what answers I’d have taken as disconfirming my hypothesis, which makes it at least a little less likely. (But I do know of more than one mechanism by which beliefs and behaviors are formed and maintained, and it would’ve been plausible—albeit less probable—that his evaluation could’ve been formed another way. And I’d have been perfectly okay with my hypothesis being wrong.)
See, I’m not pointing out what I believe to be a mistake because I think I’m smarter than Eliezer… it’s because I’m constantly making the same mistake. We all do, because it’s utterly trivial to make it, and really non-trivial to spot it. And if you haven’t gotten an intuitive grasp of why and how that mistake comes into being (for example, if you insist it doesn’t exist in the first place!), then it’s hard to see why there’s “no silver bullet” for reducing the complexity of developing “rationality” in people.
If my interlocutor is someone who might well have thoughts that don’t fit into my schemas, I might be suspicious enough of my impression that they were making one of my standard cached example-mistakes that I’d:
Make a serious effort at original seeing, and make sure my model of the such-and-such mistake is really the best way to organically understand the situation in front of me; and then
Describe my schema for the such-and-such mistake (in general), and see if the person agrees that such-and-such is a mistake; and then
Describe the instance of the such-and-such mistake that the person seems to be making, and ask if they agree or if there’s a kind of reasoning going into their claim that doesn’t fit into my schema.
Or maybe this is just the pain-in-the-neck method one should use if one’s original communication attempt stalls somewhere. Truth be told, I’m at this point rather confused about which aspects of meta-ethics under dispute, and I can’t easily scan back through the conversation to find the quotes of yours that made me think you misunderstood because our conversation has overflowed LW’s conversation-display settings. And you’ve made some good points, and I’m convinced I misunderstood you in at least some cases. I’m going to bow out of this conversation for now and wait to discuss values and their origins properly, in response to your own post. (Or if you’d rather, I’d love to discuss by email; I’m annasalamon at gmail.)
Yes, the comment system here is really not suited to the kind of conversation I’ve been trying to have… not that I’m sure what system would work for it. ;-)
As far as meta-ethics goes, the short summary is:
“Avoiding badness” and “seeking goodness” are not interchangeable when you experience them concretely on human hardware,
It is therefore a reasoning error to treat them as if they were interchangeable in your abstract moral calculations (as they will not work the same way in practice),
Due to the specific nature of the human hardware biases involved (i.e., the respective emotional, chemical, and neurological responses to pain vs. pleasure) , badness-avoidance values are highly likely to be found irrational upon detailed examination… and thus they are always the ones worth examining first.
Badness-avoidance values are a disproportionately high (if not exclusive!) source of “motivated reasoning”. i.e., we don’t so much rationalize to paint pretty pictures, as to hide the ugly ones. (Which makes rooting them out of critical importance to rationalists.)
This summary is more to clarify my thoughts for the eventual post, than an attempt to continue the discussion. (To me, these things are so obvious and so much a part of my day-to-day experience that I often forget the inferential distance involved for most people.)
These ideas are all capable of experimental verification; the first one has certainly been written about in the literature. None are particularly unorthodox or controversial in and of themselves, as far as I’m aware.
However, there are common arguments against some of these ideas that my own students bring up, so in my (eventual) post I’ll need to bring them up and refute them as well.
For example, a common argument against positively-motivated goodness is that feeling good about being generous means you’re “really” being selfish… and thus bad! So, the person advancing this argument is motivated to rationalize the “virtue” of being dutiful—i.e., doing something you don’t want to, but nonetheless “should”—because it would be bad not to.
Strangely, most people have these judgments only in relation to their self… They see no problem with someone else doing good out of generosity or kindness, with no pain or duty involved. It’s only themselves they sentence to this “virtue” of suffering to achieve goodness. (Which is sort of like “fighting for peace” or “f*ing for virginity”, but I digress.)
Whether this is something inbuilt, cultural, or selection bias of people I work with, I have no idea. But it’s damn common… and Eliezer’s making a virtue out of unhappiness (beyond the bare minimums demanded by safety, etc.) fits smack dab in the middle of this territory.
Whew. Okay, I’m going to stop writing this now… this really needs to be a post. Or several. The more I think about how to get here, starting from only the OB corpus and without recapitulating my own, the bigger I realize the inferential gap is.
You may be running into the Reversed Stupidity problem; most cases you’ve seen advocating negative feelings are stupid, therefore, you assume that all such advocations must result from the same stupidity.
I sympathize because I remember back when I would have thought that anyone arguing against the abolitionist program—that is, the total abolition of all suffering—was a Luddite.
But I eventually realized I didn’t want to eliminate my negative reinforcement hardware, and that moreover, I wouldn’t be such a bad person if I, you know, just did things the way I did want, instead of doing things the way I felt vaguely dutifully obligated to want but didn’t want.
Why am I a terrible, bad person for not wanting to modify myself in that way? What higher imperative should override: “I’d rather not do this”?
I didn’t say you’re a terrible bad person—I said your choice to be unhappy in the absence of a positive benefit from same, is likely to be found irrational, if you reflect on the concrete emotional reason you find the prospect abhorrent.
I also don’t recommend eliminating the negative reinforcement hardware, I merely recommend carefully vetting all the software you permit to run on it, or to be generated by it. (So don’t worry, I’m not an advance spokesperson for the Superhappies.)
This isn’t an absolute, just a VERY strong heuristic, in my experience. Sort of like, if someone’s going to commit suicide, I have more hoops for them to jump through to prove their rationality, than someone who’s just going to the grocery store. ;-)
And, based on what you’ve said thus far, it doesn’t sound like you’ve thoroughly investigated what concrete (near-system) rules drove the creation of your aspiration to suffering.
(As opposed to the abstract ideation that happened afterward, since a major function of abstract ideation is to allow us to hide our near-system rules from ourselves and others… an idea I got from OB, btw, and one that significantly increased the effectiveness of my work!)
Now, were you advocating a positive justification for the use of unhappiness, rather than a desire to avoid its loss, I wouldn’t need to apply the same stringency of questioning… in the same way that I wouldn’t question a masochist finding enjoyment in the experience of pain!
And if you were giving a detailed rationale for your negative justification, I’d be at least somewhat more satisfied. However, your justifications here and on OB sound to me like vague “apologies for death”, that is, they handwave various objections as being “obvious”, without providing any specific scenario in which any given person would actually be better off by not having the option of immortality, or by lacking the ability to reject unhappiness, or to get over it with arbitrary quickness.
Also, you didn’t answer any of my questions like, “So, how long would you need to be unhappy, after some specific person died?” This kind of vagueness is (in my experience) an strong indicator of negatively-motivated rationalization. After all, if this were as well-thought out as your other positions, it seems to me that you’d either already have had an answer ready, or one would have come quickly to mind.
That one question is particularly relevant, too, for determining where our positions actually differ—if they really do! I don’t mind being (briefly) unhappy, as an indicator that something is wrong. I just don’t see any point in leaving the alarm bell ringing, 24⁄7 thereafter. Our lives and concerns don’t exist on the same timescales as our ancestors, and a life-threatening problem 20 years from now, simply doesn’t merit the same type of stress response as one that’s going to happen 20 seconds from now. But our nervous systems don’t seem to know the difference, or at least lack the required dynamic range for an adequate degree of distinction.
By the way, this comment gives a more detailed explanation of how the negative reinforcement mechanism leads to undesirable results besides excessive stress (like hypocrisy and inner conflict!) compared to keeping it mostly-inactive, within the region where positive reinforcement is equally suitable to create roughly-similar results.
And now, I’m going to sign off for tonight, and take a break from writing here for a while. I need to get back to work on the writing and speaking I do for my paying customers, at least for a few days anyhow. ;-) But I nonetheless look forward to your response.
Interesting thread!
I’m not sure that pjeby has fully adressed Eliezer’s concern that “eliminating my negative emotions would be changing my preferences, and changing my preferences so that they’re satisfied is against my current preferences (otherwise, I’d just go for being an orgasmium)”.
(Well, at least that’s how I’d paraphrase it, Eliezer, tell me if I’m wrong)
To which I would answer:
Yes, it’s very possible that eliminating some negative emotions would be immoral, or at least, would change one’s preferences in a way my previous preferences would disagree with (think: eliminating the guilt over killing people, and things like that. I wouldn’t be very happy to learn that the army or police of a dictatorship is researching emotion elimination)
Still, there is probably a wide range of negative feelings that could be removed in a way that doesn’t contradict one’s original preferences—in the sense that the pre-modification person wouldn’t find the behaviour of the modified person objectionable.
The line between which changes are OK and which are not is not that obvious to draw, and many posts on OB talk about it (The difference between the morality of the ancient greek and our own, and thus the risk of “freezing” our own morality and barring future moral progress, the Confessor’s objections to non-consensual sex, etc.). pjeby might be being a bit light-handed when he dismisses concerns over changing preferences as “irrational”, but I think he meant that careful examination could show that those changes stayed in the second category and wouldn’t turn one into a immoral monster.
(It feels a bit weird answering pjeby’s post in the third person, but it felt clearer to me that way :P I’m not responding to this post in particular)
(Disclaimer: I’m one of pjeby’s clients, but that’s not why I’m here, I’ve been reading OvercomingBias since nearly the beginning)
I didn’t (explicitly) dismiss those concerns; I said that away-from reasoning has a higher rationality standard to meet, in part because it’s likely to be vague.
I wasn’t even thinking about preference-changing being dangerous, because our preferences are largely independent and mostly don’t “auto-update” when we change one—there’s a LOT of redundancy. So if a specific change isn’t compatible with your overall morality, you’ll note the dissonance, and change your preferences again to tune things better.
Science-fictional evidence of preference-changing is about as far off as science-fictional evidence of AI behavior… and for the same reasons. The built-in models our brain uses to understand minds and their preferences, are simpler than the models the brain uses to create a mind… and its preferences.
Offtopic: Shortly after you posted this, it appears that someone undertook a massive vote-down campaign, systematically searching for every comment I’ve ever posted to LW, and voting it down by 1. I don’t know if, or how these events are correlated.
But, if the person who undertook that campaign was trying to send me a message of some sort, they neglected to include any actionable information content. I only noticed because the karma number suddenly and dramatically changed when I clicked through from one page to another, reading this morning’s new comments.… and that sudden large drop was weird enough to make me investigate.
Otherwise, I probably never would’ve been aware of their action, as an action, let alone as any sort of feedback! If you want to communicate something to someone, it’s probably best to be more explicit. Or, in the alternative, contribute a patch to the LW software to let you filter out posts by people you don’t like, or perhaps the entire subthreads they participate in.
Well, it wasn’t me :)
I wish this place worked like StackOverflow, where you can only downvote once you have 100 karma; that would probably reduce the background noise in the voting …
This is what I was talking about. Please do prepare the posts, it’ll help you to clarify your position to yourself. Let them lie as drafts for a while, then make a decision about whether to post them. Note that your statements are about the form of human preference computation, not about the utility that computes the “should” following from human preferences. Do you know the derivation of expected utility formula? You refer to a well-known finding that people avoid negative reward more than they seek positive reward.
Well, there is that too, of course, but actually the issues I’m talking about here are (somewhat) orthogonal. Negatively-motivated reasoning is less likely to be rational in large part because it’s more vague—it requires only that the source of negative motivation be dismissed or avoided, rather than a particular source of positive motivation be obtained. Even if negative and positive motivation held the same weight, this issue would still apply.
The literature I was actually referring to (about the largely asynchronous and simultaneous operation of negative and positive motivation), I linked to in another comment here, after you accused me of making unorthodox and unsupported claims. In my posts, I expect to also make reference to at least one paper on “affective synchrony”, which is the degree to which our negative and positive motivation systems activate to the same degree at the same time.
All I’m pointing out is that a rationalist that ignores the irrationality of the hardware on which their computations are being run, while expecting to get good answers out of it, isn’t being very rational.
It was deliberately ad hominem, of course—just not the fallacious kind. We seriously need profile pages of some sort. Wish I had the stomach for Python.
I don’t expect anyone to be familiar with the LW/OB background corpus—I expect my education and training is quite different from yours, for example. However, I still expect one to follow rules of conduct with respect to reasonable discourse, for example avoiding equivocation and its related vices.
Or maybe I’m just viscerally angered by the winky smileys. Who knows.
I don’t see how I can separate “ethics” from “what Eliezer thinks about ethics” and still have a meaningful conversation with him on the topic.
Meanwhile, reading back through the thread, the only digressions I see in my comments are those made in response to those raised by you or Eliezer. Perhaps you could point to some specific examples of these shifted foci and emotively positive statements? I do not see them.
As for my “reprehensible” books, I trust you formed that judgment by actually reading them, yes? If so, then yes, I’m that person. But if you didn’t read them, then clearly your judgment isn’t about the books I actually wrote… and thus, I could not have been the person who wrote the (imaginary) ones you’d therefore be talking about. ;-)
I was not referring only to this thread, but to several ongoing discussions. If you’d like clear examples, feel free to contact me via http://thomblake.com or http://thomblake.mp
As Eliezer has kindof pointed out, I’m weary enough from this discussion to be on the verge of irrationality, so I shall retire from it (if only because this forum is devoted to rationality!).
I’m not aware of religions that work that way.
However, that’s how observation works.
How is it rational to treat an observation as not obviously so? I’m pretty sure that’s inconsistent, if not contradictory.
I agree with your reasoning, but I think there are plenty of reasons to be unhappy about religion that go beyond the absence of a preferred state.
In other words, I think I should be actively displeased that religion exists and is prevalent, not merely being non-happy. Neutrality is included in non-happiness, and if the word were used logically, unhappiness. But the way it’s actually used, ‘unhappy’ means active displeasure.
How is this active displeasure useful to you? Does it cause you to do something different than if you merely prefer religion to not be present? What, specifically?