My question is about the implementation of meta-ethics in the human brain. If I were going to write a program to simulate Eliezer Yudkowsky, what rules (other than “be unhappy when others are unhappy”) would I need to program in for you to arrive at this “obvious” conclusion?
In my personal experience, the morality that people arrive at by avoiding negative consequences is substantially different than the morality they arrive at by seeking positive ones.
In other words, a person who does good because they will otherwise be a bad person, is not the same as a person who does good because it brings good. Their actions and attitudes differ in substantive ways, besides the second person being happier. For example, the second person is far more likely to actually be generous and warm towards other people—especially living, present, individual people, rather than “people” as an abstraction.
So which of these two is really the “good” person, from your moral perspective?
(On another level, by the way, I fail to see how contagious, persistent unhappiness is a moral good, since it greatly magnifies the total amount of unhappiness in the universe. But that’s a separate issue from the implementation question.)
It seems to me that when you say ‘meta-ethics’ you simply mean ‘ethics’. I don’t know why you’d think meta-ethics would need to be implemented in the human brain. Ethics is in the world; meta-ethics doubly so. There’s a fact about what’s right, just like there’s a fact about what’s prime. You could ask why we care about what’s right, but that’s neither an ethical question nor a meta-ethical one. The ethical question is ‘what’s right?’ and the meta-ethical question is ‘what makes something a good answer to an ethical question?’. Both of those questions can be answered without reference to humans, though humans are the only reason why anyone would care.
Unless Eliezer has some supernatural entity to do his thinking for him, his ethics and meta-ethics require some physical implementation. Where else are you proposing that he store and process them, besides physical reality?
I think you’re shifting between ‘ethics’ and ‘what Eliezer thinks about ethics’. While it’s possible that ideas are not real save via some implementation, I don’t think it would therefore have to be in a particular human; systems know things too.
You seem to frequently shift the focus of conversation as it happens, hurting the potential for rational discourse in favor of making emotively positive statements that loosely correlate with the topic at hand. Would you be the same pjeby that writes those reprehensible self-help books?
That seemed a bit ad hominem. The commenter pjeby (I know nothing else about him) seems like someone who might be unfamiliar with part of the LW/OB background corpus but is reasoning pretty well under those conditions.
Actually, I’m quite familiar with a large segment of the OB corpus—it’s been highly influential on my work. However, I also see what appear to be a few holes or incoherencies within the OB corpus… some of which appear to stem from precisely the issue I’ve been asking you about in this thread. (i.e. the role of negative utilities in creating bias)
In my personal experience, negative utilities create bias because they cut off consideration of possibilities. This is useful in an emergency—but not much anywhere else. If human beings had platonically perfect minds, there would be no difference between a uniform utility scale and a dual positive/negative one… but as far as I can tell (and research strongly suggests) we do have two different systems.
So, although you’re wary of Robin’s “cynicism” and my “psychological explanations”, this is inconsistent with your own statements, such as:
There is no perfect argument that persuades the ideal philosopher of perfect emptiness to attach a perfectly abstract label of ‘good’. The notion of the perfectly abstract label is incoherent, which is why people chase it round and round in circles. What would distinguish a perfectly empty label of ‘good’ from a perfectly empty label of ‘bad’? How would you tell which was which?
See, I’m as puzzled by your ability to write something like that, and then turn around and argue an absolute utility for unhappiness, as you are puzzled by that Nobel-winning Bayesian dude who still believes in God. From my POV, it’s just as inconsistent.
There must be some psychology that creates your position, but if your position is “truly” valid (assuming there were such a thing), then the psychology wouldn’t matter. You should be able to destroy the position, and then reconstruct it from more basic principles, once the original influence is removed, no? (This idea is also part of the corpus.)
Are you familiar with Eliezer’s take on naturalistic meta-ethics in particular, or just with other large segments of the OB corpus? If the former, maybe you could take more care to spell out that you get the difference between “achieving one’s original goals” and “hacking one’s goal-system so that the goal-system thinks one has acheived one’s goals (e.g., by wireheading)”.
I like your writing, but in this particular thread, my impression is that you’re “rounding to the nearest cliche”—interpreting Eliezer and others as saying the nearest mistake that you’ve heard your students or others make, rather than making an effort to understand where people are coming from. My impression may be false, but it sounds like I’m not the only one who has it, and it’s distracting, so maybe take more care to spell out in visible terms a summary of peoples’ main points, so we know you’re disagreeing with what they’re saying and not with some other view.
More generally, you’ve joined a community that has been thinking awhile and has some unusual concepts. I’m glad you’ve joined the commenters, because we badly need the best techniques we can get for changing our own thinking habits and for teaching the same to others—we need techniques for learning and teaching rationality—and I find your website helpful here, and your actual thinking on the subject, in context, can probably become better still. But I wonder if you could maybe take a bit more care in general to hear the threads you’re responding to. I’ve felt like you were “rounding to the nearest cliche” in your thread with me as well (I wasn’t going off the Lisa Simpson happiness theory), and it might be nice if you could take the stance of a co-participant in the conversation, who is interested in both learning and teaching, instead of repeating the (good) points on your website in response to all comments, whatever the comments’ subject matter.
First, yes, I do understand the the difference between goal-achievement and wireheading. I’m drawing a much finer distinction about the means by which you set up a system to achieve your goals, as well as the means by which you choose those goals in the first place.
It is possible in some cases that I’ve “rounded to the nearest cliche” as you put it. But I’m pretty confident that I’m not doing that with Eliezer’s points, precisely because I’ve read so much of his work… but also because the mistake I believe he is making (or at least, the thing he appears to not be noticing) is a perfect example of a point that I was trying to make in another thread… about why you can’t just put one new, correct belief in someone’s head, and have it magically fix every broken belief they already have.
I’m a little confused about the rest of your statement; it doesn’t seem to me that I’m repeating the same points, so much as that I’ve been struggling to deal with the fact that so many of the threads I’ve become involved in, boil down (AFAICT) to the same issues—and trying NOT to have too much duplication in my responses, while also not wanting to create a bunch of inter-comment links. (Another fine example of how avoiding negatives leads to bad decisions… ;-) )
Now, whether that’s a case of me having only a hammer, or whether it’s simply because everything really is made out of ones and zeros, I’m not sure. It has been seeming to me for a bit now, that what I really need to do is write an LW article about positive/negative utility and abstract/concrete thinking, as these are the main concepts I work with that clash with some portions of the OB corpus (and some of the more vocal LW commenters). Putting that stuff in one place would certainly help reduce duplication.
Meanwhile, it’s not my intention to reduce anyone to cliche, or to presume that I understand something I don’t. If I were, I wouldn’t spend so much time in so many of my comments, asking so many questions. They are not rhetorical; they represent genuine curiosity. And I’ve actually learned quite a lot from the process of asking and commenting in the last few days; many things I’ve written here are NOT concepts I previously had.
This is especially true for the two comments that were replies to you; they were my musings on the ideas I got from your statements, more than critique or commentary of anything you said. I can see how that might make you feel not understood, however. (Also, the “Lisa Simpson theory” part of that one comment was actually directed to the comment you were replying to, not your comment in that thread, btw. I was trying to avoid writing two replies there.)
Thanks for the thoughtful reply. It’s quite possible I misinterpreted. Also, re: the Lisa Simpson thing, I’ll be more careful to look at other nearby posts people might be replying to instead of reading comments so much from the new comments page.
It seems slightly odd to me that you say you’re “pretty confident” you’re not rounding Eliezer’s point to the nearest cliche in part because the mistake you think he’s making “is a perfect example of a point [you] were trying to make in another thread”. Isn’t that what it feels like when one rounds someone’s response to a pre-existing image of “oh, the such-and-such mistake”?
A LW article about how people think about positive/negative utility, and another about abstract/concrete thinking, sounds wonderful. Then we can sift through your concepts as a community, air confusions or objections in a coherent manner, etc.; and you can reference it and it’ll be part of our shared corpus. Both topics sound useful.
Isn’t that what it feels like when one rounds someone’s response to a pre-existing image of “oh, the such-and-such mistake”?
So, how would you distinguish that, from the case where their response is making the such-and-such mistake?
The way I’d distinguish it, is to ask questions that would have different answers, depending on whether the person is making that mistake or not. I asked Eliezer those questions, and of the ones he answered, the answers were consistent with my model of the mistake.
Of course, there’s always the possibility of confirmation bias… except that I also know what answers I’d have taken as disconfirming my hypothesis, which makes it at least a little less likely. (But I do know of more than one mechanism by which beliefs and behaviors are formed and maintained, and it would’ve been plausible—albeit less probable—that his evaluation could’ve been formed another way. And I’d have been perfectly okay with my hypothesis being wrong.)
See, I’m not pointing out what I believe to be a mistake because I think I’m smarter than Eliezer… it’s because I’m constantly making the same mistake. We all do, because it’s utterly trivial to make it, and really non-trivial to spot it. And if you haven’t gotten an intuitive grasp of why and how that mistake comes into being (for example, if you insist it doesn’t exist in the first place!), then it’s hard to see why there’s “no silver bullet” for reducing the complexity of developing “rationality” in people.
So, how would you distinguish that, from the case where their response is making the such-and-such mistake?
If my interlocutor is someone who might well have thoughts that don’t fit into my schemas, I might be suspicious enough of my impression that they were making one of my standard cached example-mistakes that I’d:
Make a serious effort at original seeing, and make sure my model of the such-and-such mistake is really the best way to organically understand the situation in front of me; and then
Describe my schema for the such-and-such mistake (in general), and see if the person agrees that such-and-such is a mistake; and then
Describe the instance of the such-and-such mistake that the person seems to be making, and ask if they agree or if there’s a kind of reasoning going into their claim that doesn’t fit into my schema.
Or maybe this is just the pain-in-the-neck method one should use if one’s original communication attempt stalls somewhere. Truth be told, I’m at this point rather confused about which aspects of meta-ethics under dispute, and I can’t easily scan back through the conversation to find the quotes of yours that made me think you misunderstood because our conversation has overflowed LW’s conversation-display settings. And you’ve made some good points, and I’m convinced I misunderstood you in at least some cases. I’m going to bow out of this conversation for now and wait to discuss values and their origins properly, in response to your own post. (Or if you’d rather, I’d love to discuss by email; I’m annasalamon at gmail.)
Yes, the comment system here is really not suited to the kind of conversation I’ve been trying to have… not that I’m sure what system would work for it. ;-)
As far as meta-ethics goes, the short summary is:
“Avoiding badness” and “seeking goodness” are not interchangeable when you experience them concretely on human hardware,
It is therefore a reasoning error to treat them as if they were interchangeable in your abstract moral calculations (as they will not work the same way in practice),
Due to the specific nature of the human hardware biases involved (i.e., the respective emotional, chemical, and neurological responses to pain vs. pleasure) , badness-avoidance values are highly likely to be found irrational upon detailed examination… and thus they are always the ones worth examining first.
Badness-avoidance values are a disproportionately high (if not exclusive!) source of “motivated reasoning”. i.e., we don’t so much rationalize to paint pretty pictures, as to hide the ugly ones. (Which makes rooting them out of critical importance to rationalists.)
This summary is more to clarify my thoughts for the eventual post, than an attempt to continue the discussion. (To me, these things are so obvious and so much a part of my day-to-day experience that I often forget the inferential distance involved for most people.)
These ideas are all capable of experimental verification; the first one has certainly been written about in the literature. None are particularly unorthodox or controversial in and of themselves, as far as I’m aware.
However, there are common arguments against some of these ideas that my own students bring up, so in my (eventual) post I’ll need to bring them up and refute them as well.
For example, a common argument against positively-motivated goodness is that feeling good about being generous means you’re “really” being selfish… and thus bad! So, the person advancing this argument is motivated to rationalize the “virtue” of being dutiful—i.e., doing something you don’t want to, but nonetheless “should”—because it would be bad not to.
Strangely, most people have these judgments only in relation to their self… They see no problem with someone else doing good out of generosity or kindness, with no pain or duty involved. It’s only themselves they sentence to this “virtue” of suffering to achieve goodness. (Which is sort of like “fighting for peace” or “f*ing for virginity”, but I digress.)
Whether this is something inbuilt, cultural, or selection bias of people I work with, I have no idea. But it’s damn common… and Eliezer’s making a virtue out of unhappiness (beyond the bare minimums demanded by safety, etc.) fits smack dab in the middle of this territory.
Whew. Okay, I’m going to stop writing this now… this really needs to be a post. Or several. The more I think about how to get here, starting from only the OB corpus and without recapitulating my own, the bigger I realize the inferential gap is.
You may be running into the Reversed Stupidity problem; most cases you’ve seen advocating negative feelings are stupid, therefore, you assume that all such advocations must result from the same stupidity.
I sympathize because I remember back when I would have thought that anyone arguing against the abolitionist program—that is, the total abolition of all suffering—was a Luddite.
But I eventually realized I didn’t want to eliminate my negative reinforcement hardware, and that moreover, I wouldn’t be such a bad person if I, you know, just did things the way I did want, instead of doing things the way I felt vaguely dutifully obligated to want but didn’t want.
Why am I a terrible, bad person for not wanting to modify myself in that way? What higher imperative should override: “I’d rather not do this”?
I didn’t say you’re a terrible bad person—I said your choice to be unhappy in the absence of a positive benefit from same, is likely to be found irrational, if you reflect on the concrete emotional reason you find the prospect abhorrent.
I also don’t recommend eliminating the negative reinforcement hardware, I merely recommend carefully vetting all the software you permit to run on it, or to be generated by it. (So don’t worry, I’m not an advance spokesperson for the Superhappies.)
This isn’t an absolute, just a VERY strong heuristic, in my experience. Sort of like, if someone’s going to commit suicide, I have more hoops for them to jump through to prove their rationality, than someone who’s just going to the grocery store. ;-)
And, based on what you’ve said thus far, it doesn’t sound like you’ve thoroughly investigated what concrete (near-system) rules drove the creation of your aspiration to suffering.
(As opposed to the abstract ideation that happened afterward, since a major function of abstract ideation is to allow us to hide our near-system rules from ourselves and others… an idea I got from OB, btw, and one that significantly increased the effectiveness of my work!)
Now, were you advocating a positive justification for the use of unhappiness, rather than a desire to avoid its loss, I wouldn’t need to apply the same stringency of questioning… in the same way that I wouldn’t question a masochist finding enjoyment in the experience of pain!
And if you were giving a detailed rationale for your negative justification, I’d be at least somewhat more satisfied. However, your justifications here and on OB sound to me like vague “apologies for death”, that is, they handwave various objections as being “obvious”, without providing any specific scenario in which any given person would actually be better off by not having the option of immortality, or by lacking the ability to reject unhappiness, or to get over it with arbitrary quickness.
Also, you didn’t answer any of my questions like, “So, how long would you need to be unhappy, after some specific person died?” This kind of vagueness is (in my experience) an strong indicator of negatively-motivated rationalization. After all, if this were as well-thought out as your other positions, it seems to me that you’d either already have had an answer ready, or one would have come quickly to mind.
That one question is particularly relevant, too, for determining where our positions actually differ—if they really do! I don’t mind being (briefly) unhappy, as an indicator that something is wrong. I just don’t see any point in leaving the alarm bell ringing, 24⁄7 thereafter. Our lives and concerns don’t exist on the same timescales as our ancestors, and a life-threatening problem 20 years from now, simply doesn’t merit the same type of stress response as one that’s going to happen 20 seconds from now. But our nervous systems don’t seem to know the difference, or at least lack the required dynamic range for an adequate degree of distinction.
By the way, this comment gives a more detailed explanation of how the negative reinforcement mechanism leads to undesirable results besides excessive stress (like hypocrisy and inner conflict!) compared to keeping it mostly-inactive, within the region where positive reinforcement is equally suitable to create roughly-similar results.
And now, I’m going to sign off for tonight, and take a break from writing here for a while. I need to get back to work on the writing and speaking I do for my paying customers, at least for a few days anyhow. ;-) But I nonetheless look forward to your response.
I’m not sure that pjeby has fully adressed Eliezer’s concern that “eliminating my negative emotions would be changing my preferences, and changing my preferences so that they’re satisfied is against my current preferences (otherwise, I’d just go for being an orgasmium)”.
(Well, at least that’s how I’d paraphrase it, Eliezer, tell me if I’m wrong)
To which I would answer:
Yes, it’s very possible that eliminating some negative emotions would be immoral, or at least, would change one’s preferences in a way my previous preferences would disagree with (think: eliminating the guilt over killing people, and things like that. I wouldn’t be very happy to learn that the army or police of a dictatorship is researching emotion elimination)
Still, there is probably a wide range of negative feelings that could be removed in a way that doesn’t contradict one’s original preferences—in the sense that the pre-modification person wouldn’t find the behaviour of the modified person objectionable.
The line between which changes are OK and which are not is not that obvious to draw, and many posts on OB talk about it (The difference between the morality of the ancient greek and our own, and thus the risk of “freezing” our own morality and barring future moral progress, the Confessor’s objections to non-consensual sex, etc.). pjeby might be being a bit light-handed when he dismisses concerns over changing preferences as “irrational”, but I think he meant that careful examination could show that those changes stayed in the second category and wouldn’t turn one into a immoral monster.
(It feels a bit weird answering pjeby’s post in the third person, but it felt clearer to me that way :P I’m not responding to this post in particular)
(Disclaimer: I’m one of pjeby’s clients, but that’s not why I’m here, I’ve been reading OvercomingBias since nearly the beginning)
pjeby might be being a bit light-handed when he dismisses concerns over changing preferences as “irrational”
I didn’t (explicitly) dismiss those concerns; I said that away-from reasoning has a higher rationality standard to meet, in part because it’s likely to be vague.
I wasn’t even thinking about preference-changing being dangerous, because our preferences are largely independent and mostly don’t “auto-update” when we change one—there’s a LOT of redundancy. So if a specific change isn’t compatible with your overall morality, you’ll note the dissonance, and change your preferences again to tune things better.
Science-fictional evidence of preference-changing is about as far off as science-fictional evidence of AI behavior… and for the same reasons. The built-in models our brain uses to understand minds and their preferences, are simpler than the models the brain uses to create a mind… and its preferences.
Offtopic: Shortly after you posted this, it appears that someone undertook a massive vote-down campaign, systematically searching for every comment I’ve ever posted to LW, and voting it down by 1. I don’t know if, or how these events are correlated.
But, if the person who undertook that campaign was trying to send me a message of some sort, they neglected to include any actionable information content. I only noticed because the karma number suddenly and dramatically changed when I clicked through from one page to another, reading this morning’s new comments.… and that sudden large drop was weird enough to make me investigate.
Otherwise, I probably never would’ve been aware of their action, as an action, let alone as any sort of feedback! If you want to communicate something to someone, it’s probably best to be more explicit. Or, in the alternative, contribute a patch to the LW software to let you filter out posts by people you don’t like, or perhaps the entire subthreads they participate in.
I wish this place worked like StackOverflow, where you can only downvote once you have 100 karma; that would probably reduce the background noise in the voting …
This is what I was talking about. Please do prepare the posts, it’ll help you to clarify your position to yourself. Let them lie as drafts for a while, then make a decision about whether to post them. Note that your statements are about the form of human preference computation, not about the utility that computes the “should” following from human preferences. Do you know the derivation of expected utility formula? You refer to a well-known finding that people avoid negative reward more than they seek positive reward.
You refer to a well-known finding that people avoid negative reward more than they seek positive reward.
Well, there is that too, of course, but actually the issues I’m talking about here are (somewhat) orthogonal. Negatively-motivated reasoning is less likely to be rational in large part because it’s more vague—it requires only that the source of negative motivation be dismissed or avoided, rather than a particular source of positive motivation be obtained. Even if negative and positive motivation held the same weight, this issue would still apply.
The literature I was actually referring to (about the largely asynchronous and simultaneous operation of negative and positive motivation), I linked to in another comment here, after you accused me of making unorthodox and unsupported claims. In my posts, I expect to also make reference to at least one paper on “affective synchrony”, which is the degree to which our negative and positive motivation systems activate to the same degree at the same time.
Note that your statements are about the form of human preference computation, not about the utility that computes the “should” following from human preferences.
All I’m pointing out is that a rationalist that ignores the irrationality of the hardware on which their computations are being run, while expecting to get good answers out of it, isn’t being very rational.
It was deliberately ad hominem, of course—just not the fallacious kind. We seriously need profile pages of some sort. Wish I had the stomach for Python.
I don’t expect anyone to be familiar with the LW/OB background corpus—I expect my education and training is quite different from yours, for example. However, I still expect one to follow rules of conduct with respect to reasonable discourse, for example avoiding equivocation and its related vices.
Or maybe I’m just viscerally angered by the winky smileys. Who knows.
I don’t see how I can separate “ethics” from “what Eliezer thinks about ethics” and still have a meaningful conversation with him on the topic.
Meanwhile, reading back through the thread, the only digressions I see in my comments are those made in response to those raised by you or Eliezer. Perhaps you could point to some specific examples of these shifted foci and emotively positive statements? I do not see them.
As for my “reprehensible” books, I trust you formed that judgment by actually reading them, yes? If so, then yes, I’m that person. But if you didn’t read them, then clearly your judgment isn’t about the books I actually wrote… and thus, I could not have been the person who wrote the (imaginary) ones you’d therefore be talking about. ;-)
Perhaps you could point to some specific examples of these shifted foci and emotively positive statements? I do not see them.
I was not referring only to this thread, but to several ongoing discussions. If you’d like clear examples, feel free to contact me via http://thomblake.com or http://thomblake.mp
As Eliezer has kindof pointed out, I’m weary enough from this discussion to be on the verge of irrationality, so I shall retire from it (if only because this forum is devoted to rationality!).
How is it rational to treat a “moral observation” as “obviously so”? That’s how religion works, isn’t it?
This discussion is now about
my view on which is summarized in Joy in the Merely Good.
My question is about the implementation of meta-ethics in the human brain. If I were going to write a program to simulate Eliezer Yudkowsky, what rules (other than “be unhappy when others are unhappy”) would I need to program in for you to arrive at this “obvious” conclusion?
In my personal experience, the morality that people arrive at by avoiding negative consequences is substantially different than the morality they arrive at by seeking positive ones.
In other words, a person who does good because they will otherwise be a bad person, is not the same as a person who does good because it brings good. Their actions and attitudes differ in substantive ways, besides the second person being happier. For example, the second person is far more likely to actually be generous and warm towards other people—especially living, present, individual people, rather than “people” as an abstraction.
So which of these two is really the “good” person, from your moral perspective?
(On another level, by the way, I fail to see how contagious, persistent unhappiness is a moral good, since it greatly magnifies the total amount of unhappiness in the universe. But that’s a separate issue from the implementation question.)
It seems to me that when you say ‘meta-ethics’ you simply mean ‘ethics’. I don’t know why you’d think meta-ethics would need to be implemented in the human brain. Ethics is in the world; meta-ethics doubly so. There’s a fact about what’s right, just like there’s a fact about what’s prime. You could ask why we care about what’s right, but that’s neither an ethical question nor a meta-ethical one. The ethical question is ‘what’s right?’ and the meta-ethical question is ‘what makes something a good answer to an ethical question?’. Both of those questions can be answered without reference to humans, though humans are the only reason why anyone would care.
Unless Eliezer has some supernatural entity to do his thinking for him, his ethics and meta-ethics require some physical implementation. Where else are you proposing that he store and process them, besides physical reality?
I think you’re shifting between ‘ethics’ and ‘what Eliezer thinks about ethics’. While it’s possible that ideas are not real save via some implementation, I don’t think it would therefore have to be in a particular human; systems know things too.
You seem to frequently shift the focus of conversation as it happens, hurting the potential for rational discourse in favor of making emotively positive statements that loosely correlate with the topic at hand. Would you be the same pjeby that writes those reprehensible self-help books?
That seemed a bit ad hominem. The commenter pjeby (I know nothing else about him) seems like someone who might be unfamiliar with part of the LW/OB background corpus but is reasoning pretty well under those conditions.
Actually, I’m quite familiar with a large segment of the OB corpus—it’s been highly influential on my work. However, I also see what appear to be a few holes or incoherencies within the OB corpus… some of which appear to stem from precisely the issue I’ve been asking you about in this thread. (i.e. the role of negative utilities in creating bias)
In my personal experience, negative utilities create bias because they cut off consideration of possibilities. This is useful in an emergency—but not much anywhere else. If human beings had platonically perfect minds, there would be no difference between a uniform utility scale and a dual positive/negative one… but as far as I can tell (and research strongly suggests) we do have two different systems.
So, although you’re wary of Robin’s “cynicism” and my “psychological explanations”, this is inconsistent with your own statements, such as:
See, I’m as puzzled by your ability to write something like that, and then turn around and argue an absolute utility for unhappiness, as you are puzzled by that Nobel-winning Bayesian dude who still believes in God. From my POV, it’s just as inconsistent.
There must be some psychology that creates your position, but if your position is “truly” valid (assuming there were such a thing), then the psychology wouldn’t matter. You should be able to destroy the position, and then reconstruct it from more basic principles, once the original influence is removed, no? (This idea is also part of the corpus.)
pjeby,
Are you familiar with Eliezer’s take on naturalistic meta-ethics in particular, or just with other large segments of the OB corpus? If the former, maybe you could take more care to spell out that you get the difference between “achieving one’s original goals” and “hacking one’s goal-system so that the goal-system thinks one has acheived one’s goals (e.g., by wireheading)”.
I like your writing, but in this particular thread, my impression is that you’re “rounding to the nearest cliche”—interpreting Eliezer and others as saying the nearest mistake that you’ve heard your students or others make, rather than making an effort to understand where people are coming from. My impression may be false, but it sounds like I’m not the only one who has it, and it’s distracting, so maybe take more care to spell out in visible terms a summary of peoples’ main points, so we know you’re disagreeing with what they’re saying and not with some other view.
More generally, you’ve joined a community that has been thinking awhile and has some unusual concepts. I’m glad you’ve joined the commenters, because we badly need the best techniques we can get for changing our own thinking habits and for teaching the same to others—we need techniques for learning and teaching rationality—and I find your website helpful here, and your actual thinking on the subject, in context, can probably become better still. But I wonder if you could maybe take a bit more care in general to hear the threads you’re responding to. I’ve felt like you were “rounding to the nearest cliche” in your thread with me as well (I wasn’t going off the Lisa Simpson happiness theory), and it might be nice if you could take the stance of a co-participant in the conversation, who is interested in both learning and teaching, instead of repeating the (good) points on your website in response to all comments, whatever the comments’ subject matter.
First, yes, I do understand the the difference between goal-achievement and wireheading. I’m drawing a much finer distinction about the means by which you set up a system to achieve your goals, as well as the means by which you choose those goals in the first place.
It is possible in some cases that I’ve “rounded to the nearest cliche” as you put it. But I’m pretty confident that I’m not doing that with Eliezer’s points, precisely because I’ve read so much of his work… but also because the mistake I believe he is making (or at least, the thing he appears to not be noticing) is a perfect example of a point that I was trying to make in another thread… about why you can’t just put one new, correct belief in someone’s head, and have it magically fix every broken belief they already have.
I’m a little confused about the rest of your statement; it doesn’t seem to me that I’m repeating the same points, so much as that I’ve been struggling to deal with the fact that so many of the threads I’ve become involved in, boil down (AFAICT) to the same issues—and trying NOT to have too much duplication in my responses, while also not wanting to create a bunch of inter-comment links. (Another fine example of how avoiding negatives leads to bad decisions… ;-) )
Now, whether that’s a case of me having only a hammer, or whether it’s simply because everything really is made out of ones and zeros, I’m not sure. It has been seeming to me for a bit now, that what I really need to do is write an LW article about positive/negative utility and abstract/concrete thinking, as these are the main concepts I work with that clash with some portions of the OB corpus (and some of the more vocal LW commenters). Putting that stuff in one place would certainly help reduce duplication.
Meanwhile, it’s not my intention to reduce anyone to cliche, or to presume that I understand something I don’t. If I were, I wouldn’t spend so much time in so many of my comments, asking so many questions. They are not rhetorical; they represent genuine curiosity. And I’ve actually learned quite a lot from the process of asking and commenting in the last few days; many things I’ve written here are NOT concepts I previously had.
This is especially true for the two comments that were replies to you; they were my musings on the ideas I got from your statements, more than critique or commentary of anything you said. I can see how that might make you feel not understood, however. (Also, the “Lisa Simpson theory” part of that one comment was actually directed to the comment you were replying to, not your comment in that thread, btw. I was trying to avoid writing two replies there.)
I also get the sense that you’re trying to say something off-the-cuff in your replies that would be better done as a specific LW post.
Thanks for the thoughtful reply. It’s quite possible I misinterpreted. Also, re: the Lisa Simpson thing, I’ll be more careful to look at other nearby posts people might be replying to instead of reading comments so much from the new comments page.
It seems slightly odd to me that you say you’re “pretty confident” you’re not rounding Eliezer’s point to the nearest cliche in part because the mistake you think he’s making “is a perfect example of a point [you] were trying to make in another thread”. Isn’t that what it feels like when one rounds someone’s response to a pre-existing image of “oh, the such-and-such mistake”?
A LW article about how people think about positive/negative utility, and another about abstract/concrete thinking, sounds wonderful. Then we can sift through your concepts as a community, air confusions or objections in a coherent manner, etc.; and you can reference it and it’ll be part of our shared corpus. Both topics sound useful.
So, how would you distinguish that, from the case where their response is making the such-and-such mistake?
The way I’d distinguish it, is to ask questions that would have different answers, depending on whether the person is making that mistake or not. I asked Eliezer those questions, and of the ones he answered, the answers were consistent with my model of the mistake.
Of course, there’s always the possibility of confirmation bias… except that I also know what answers I’d have taken as disconfirming my hypothesis, which makes it at least a little less likely. (But I do know of more than one mechanism by which beliefs and behaviors are formed and maintained, and it would’ve been plausible—albeit less probable—that his evaluation could’ve been formed another way. And I’d have been perfectly okay with my hypothesis being wrong.)
See, I’m not pointing out what I believe to be a mistake because I think I’m smarter than Eliezer… it’s because I’m constantly making the same mistake. We all do, because it’s utterly trivial to make it, and really non-trivial to spot it. And if you haven’t gotten an intuitive grasp of why and how that mistake comes into being (for example, if you insist it doesn’t exist in the first place!), then it’s hard to see why there’s “no silver bullet” for reducing the complexity of developing “rationality” in people.
If my interlocutor is someone who might well have thoughts that don’t fit into my schemas, I might be suspicious enough of my impression that they were making one of my standard cached example-mistakes that I’d:
Make a serious effort at original seeing, and make sure my model of the such-and-such mistake is really the best way to organically understand the situation in front of me; and then
Describe my schema for the such-and-such mistake (in general), and see if the person agrees that such-and-such is a mistake; and then
Describe the instance of the such-and-such mistake that the person seems to be making, and ask if they agree or if there’s a kind of reasoning going into their claim that doesn’t fit into my schema.
Or maybe this is just the pain-in-the-neck method one should use if one’s original communication attempt stalls somewhere. Truth be told, I’m at this point rather confused about which aspects of meta-ethics under dispute, and I can’t easily scan back through the conversation to find the quotes of yours that made me think you misunderstood because our conversation has overflowed LW’s conversation-display settings. And you’ve made some good points, and I’m convinced I misunderstood you in at least some cases. I’m going to bow out of this conversation for now and wait to discuss values and their origins properly, in response to your own post. (Or if you’d rather, I’d love to discuss by email; I’m annasalamon at gmail.)
Yes, the comment system here is really not suited to the kind of conversation I’ve been trying to have… not that I’m sure what system would work for it. ;-)
As far as meta-ethics goes, the short summary is:
“Avoiding badness” and “seeking goodness” are not interchangeable when you experience them concretely on human hardware,
It is therefore a reasoning error to treat them as if they were interchangeable in your abstract moral calculations (as they will not work the same way in practice),
Due to the specific nature of the human hardware biases involved (i.e., the respective emotional, chemical, and neurological responses to pain vs. pleasure) , badness-avoidance values are highly likely to be found irrational upon detailed examination… and thus they are always the ones worth examining first.
Badness-avoidance values are a disproportionately high (if not exclusive!) source of “motivated reasoning”. i.e., we don’t so much rationalize to paint pretty pictures, as to hide the ugly ones. (Which makes rooting them out of critical importance to rationalists.)
This summary is more to clarify my thoughts for the eventual post, than an attempt to continue the discussion. (To me, these things are so obvious and so much a part of my day-to-day experience that I often forget the inferential distance involved for most people.)
These ideas are all capable of experimental verification; the first one has certainly been written about in the literature. None are particularly unorthodox or controversial in and of themselves, as far as I’m aware.
However, there are common arguments against some of these ideas that my own students bring up, so in my (eventual) post I’ll need to bring them up and refute them as well.
For example, a common argument against positively-motivated goodness is that feeling good about being generous means you’re “really” being selfish… and thus bad! So, the person advancing this argument is motivated to rationalize the “virtue” of being dutiful—i.e., doing something you don’t want to, but nonetheless “should”—because it would be bad not to.
Strangely, most people have these judgments only in relation to their self… They see no problem with someone else doing good out of generosity or kindness, with no pain or duty involved. It’s only themselves they sentence to this “virtue” of suffering to achieve goodness. (Which is sort of like “fighting for peace” or “f*ing for virginity”, but I digress.)
Whether this is something inbuilt, cultural, or selection bias of people I work with, I have no idea. But it’s damn common… and Eliezer’s making a virtue out of unhappiness (beyond the bare minimums demanded by safety, etc.) fits smack dab in the middle of this territory.
Whew. Okay, I’m going to stop writing this now… this really needs to be a post. Or several. The more I think about how to get here, starting from only the OB corpus and without recapitulating my own, the bigger I realize the inferential gap is.
You may be running into the Reversed Stupidity problem; most cases you’ve seen advocating negative feelings are stupid, therefore, you assume that all such advocations must result from the same stupidity.
I sympathize because I remember back when I would have thought that anyone arguing against the abolitionist program—that is, the total abolition of all suffering—was a Luddite.
But I eventually realized I didn’t want to eliminate my negative reinforcement hardware, and that moreover, I wouldn’t be such a bad person if I, you know, just did things the way I did want, instead of doing things the way I felt vaguely dutifully obligated to want but didn’t want.
Why am I a terrible, bad person for not wanting to modify myself in that way? What higher imperative should override: “I’d rather not do this”?
I didn’t say you’re a terrible bad person—I said your choice to be unhappy in the absence of a positive benefit from same, is likely to be found irrational, if you reflect on the concrete emotional reason you find the prospect abhorrent.
I also don’t recommend eliminating the negative reinforcement hardware, I merely recommend carefully vetting all the software you permit to run on it, or to be generated by it. (So don’t worry, I’m not an advance spokesperson for the Superhappies.)
This isn’t an absolute, just a VERY strong heuristic, in my experience. Sort of like, if someone’s going to commit suicide, I have more hoops for them to jump through to prove their rationality, than someone who’s just going to the grocery store. ;-)
And, based on what you’ve said thus far, it doesn’t sound like you’ve thoroughly investigated what concrete (near-system) rules drove the creation of your aspiration to suffering.
(As opposed to the abstract ideation that happened afterward, since a major function of abstract ideation is to allow us to hide our near-system rules from ourselves and others… an idea I got from OB, btw, and one that significantly increased the effectiveness of my work!)
Now, were you advocating a positive justification for the use of unhappiness, rather than a desire to avoid its loss, I wouldn’t need to apply the same stringency of questioning… in the same way that I wouldn’t question a masochist finding enjoyment in the experience of pain!
And if you were giving a detailed rationale for your negative justification, I’d be at least somewhat more satisfied. However, your justifications here and on OB sound to me like vague “apologies for death”, that is, they handwave various objections as being “obvious”, without providing any specific scenario in which any given person would actually be better off by not having the option of immortality, or by lacking the ability to reject unhappiness, or to get over it with arbitrary quickness.
Also, you didn’t answer any of my questions like, “So, how long would you need to be unhappy, after some specific person died?” This kind of vagueness is (in my experience) an strong indicator of negatively-motivated rationalization. After all, if this were as well-thought out as your other positions, it seems to me that you’d either already have had an answer ready, or one would have come quickly to mind.
That one question is particularly relevant, too, for determining where our positions actually differ—if they really do! I don’t mind being (briefly) unhappy, as an indicator that something is wrong. I just don’t see any point in leaving the alarm bell ringing, 24⁄7 thereafter. Our lives and concerns don’t exist on the same timescales as our ancestors, and a life-threatening problem 20 years from now, simply doesn’t merit the same type of stress response as one that’s going to happen 20 seconds from now. But our nervous systems don’t seem to know the difference, or at least lack the required dynamic range for an adequate degree of distinction.
By the way, this comment gives a more detailed explanation of how the negative reinforcement mechanism leads to undesirable results besides excessive stress (like hypocrisy and inner conflict!) compared to keeping it mostly-inactive, within the region where positive reinforcement is equally suitable to create roughly-similar results.
And now, I’m going to sign off for tonight, and take a break from writing here for a while. I need to get back to work on the writing and speaking I do for my paying customers, at least for a few days anyhow. ;-) But I nonetheless look forward to your response.
Interesting thread!
I’m not sure that pjeby has fully adressed Eliezer’s concern that “eliminating my negative emotions would be changing my preferences, and changing my preferences so that they’re satisfied is against my current preferences (otherwise, I’d just go for being an orgasmium)”.
(Well, at least that’s how I’d paraphrase it, Eliezer, tell me if I’m wrong)
To which I would answer:
Yes, it’s very possible that eliminating some negative emotions would be immoral, or at least, would change one’s preferences in a way my previous preferences would disagree with (think: eliminating the guilt over killing people, and things like that. I wouldn’t be very happy to learn that the army or police of a dictatorship is researching emotion elimination)
Still, there is probably a wide range of negative feelings that could be removed in a way that doesn’t contradict one’s original preferences—in the sense that the pre-modification person wouldn’t find the behaviour of the modified person objectionable.
The line between which changes are OK and which are not is not that obvious to draw, and many posts on OB talk about it (The difference between the morality of the ancient greek and our own, and thus the risk of “freezing” our own morality and barring future moral progress, the Confessor’s objections to non-consensual sex, etc.). pjeby might be being a bit light-handed when he dismisses concerns over changing preferences as “irrational”, but I think he meant that careful examination could show that those changes stayed in the second category and wouldn’t turn one into a immoral monster.
(It feels a bit weird answering pjeby’s post in the third person, but it felt clearer to me that way :P I’m not responding to this post in particular)
(Disclaimer: I’m one of pjeby’s clients, but that’s not why I’m here, I’ve been reading OvercomingBias since nearly the beginning)
I didn’t (explicitly) dismiss those concerns; I said that away-from reasoning has a higher rationality standard to meet, in part because it’s likely to be vague.
I wasn’t even thinking about preference-changing being dangerous, because our preferences are largely independent and mostly don’t “auto-update” when we change one—there’s a LOT of redundancy. So if a specific change isn’t compatible with your overall morality, you’ll note the dissonance, and change your preferences again to tune things better.
Science-fictional evidence of preference-changing is about as far off as science-fictional evidence of AI behavior… and for the same reasons. The built-in models our brain uses to understand minds and their preferences, are simpler than the models the brain uses to create a mind… and its preferences.
Offtopic: Shortly after you posted this, it appears that someone undertook a massive vote-down campaign, systematically searching for every comment I’ve ever posted to LW, and voting it down by 1. I don’t know if, or how these events are correlated.
But, if the person who undertook that campaign was trying to send me a message of some sort, they neglected to include any actionable information content. I only noticed because the karma number suddenly and dramatically changed when I clicked through from one page to another, reading this morning’s new comments.… and that sudden large drop was weird enough to make me investigate.
Otherwise, I probably never would’ve been aware of their action, as an action, let alone as any sort of feedback! If you want to communicate something to someone, it’s probably best to be more explicit. Or, in the alternative, contribute a patch to the LW software to let you filter out posts by people you don’t like, or perhaps the entire subthreads they participate in.
Well, it wasn’t me :)
I wish this place worked like StackOverflow, where you can only downvote once you have 100 karma; that would probably reduce the background noise in the voting …
This is what I was talking about. Please do prepare the posts, it’ll help you to clarify your position to yourself. Let them lie as drafts for a while, then make a decision about whether to post them. Note that your statements are about the form of human preference computation, not about the utility that computes the “should” following from human preferences. Do you know the derivation of expected utility formula? You refer to a well-known finding that people avoid negative reward more than they seek positive reward.
Well, there is that too, of course, but actually the issues I’m talking about here are (somewhat) orthogonal. Negatively-motivated reasoning is less likely to be rational in large part because it’s more vague—it requires only that the source of negative motivation be dismissed or avoided, rather than a particular source of positive motivation be obtained. Even if negative and positive motivation held the same weight, this issue would still apply.
The literature I was actually referring to (about the largely asynchronous and simultaneous operation of negative and positive motivation), I linked to in another comment here, after you accused me of making unorthodox and unsupported claims. In my posts, I expect to also make reference to at least one paper on “affective synchrony”, which is the degree to which our negative and positive motivation systems activate to the same degree at the same time.
All I’m pointing out is that a rationalist that ignores the irrationality of the hardware on which their computations are being run, while expecting to get good answers out of it, isn’t being very rational.
It was deliberately ad hominem, of course—just not the fallacious kind. We seriously need profile pages of some sort. Wish I had the stomach for Python.
I don’t expect anyone to be familiar with the LW/OB background corpus—I expect my education and training is quite different from yours, for example. However, I still expect one to follow rules of conduct with respect to reasonable discourse, for example avoiding equivocation and its related vices.
Or maybe I’m just viscerally angered by the winky smileys. Who knows.
I don’t see how I can separate “ethics” from “what Eliezer thinks about ethics” and still have a meaningful conversation with him on the topic.
Meanwhile, reading back through the thread, the only digressions I see in my comments are those made in response to those raised by you or Eliezer. Perhaps you could point to some specific examples of these shifted foci and emotively positive statements? I do not see them.
As for my “reprehensible” books, I trust you formed that judgment by actually reading them, yes? If so, then yes, I’m that person. But if you didn’t read them, then clearly your judgment isn’t about the books I actually wrote… and thus, I could not have been the person who wrote the (imaginary) ones you’d therefore be talking about. ;-)
I was not referring only to this thread, but to several ongoing discussions. If you’d like clear examples, feel free to contact me via http://thomblake.com or http://thomblake.mp
As Eliezer has kindof pointed out, I’m weary enough from this discussion to be on the verge of irrationality, so I shall retire from it (if only because this forum is devoted to rationality!).
I’m not aware of religions that work that way.
However, that’s how observation works.
How is it rational to treat an observation as not obviously so? I’m pretty sure that’s inconsistent, if not contradictory.