I remain confused by Eliezer’s metaethics sequence.
Both there and in By Which It May Be Judged, I see Eliezer successfully arguing that (something like) moral realism is possible in a reductionist universe (I agree), but he also seems to want to say that in fact (something like) moral realism actually obtains, and I don’t understand what the argument for that is. In particular, one way (the way?) his metaethics might spit up something that looks a lot like moral realism is if there is strong convergence of values upon (human-ish?) agents receiving better information, time enough to work out contradictions in their values, etc. But the “strong convergence of values” thesis hasn’t really been argued, so I remain unclear as to why Eliezer finds it plausible.
Basically, I read the metaethics sequence as asserting both things but arguing only for the first.
But I’m not sure about this. Perhaps because I was already familiar with the professional metaethics vocabulary when I read the sequence, I found Eliezer’s vocabulary for talking about positions in metaethics confusing.
I meant to explore these issues in a vocabulary I find more clear, in my own metaethics sequence, but I still haven’t got around to it. :(
(I’m putting this as a reply to your comment because your comment is what made me think of it.)
In my view, Eliezer’s “metaethics” sequence, despite its name, argues for his ethical theory, roughly
(1) morality[humans] = CEV[humans]
(N.B.: this is my terminology; Eliezer would write “morality” where I write “morality[humans]”) without ever arguing for his (implied) metaethical theory, which is something like
(2) for all X, morality[X] = CEV[X].
Worse, much of his effort is spent arguing against propositions like
Eliezer’s “metaethics” sequence, despite its name, argues for his ethical theory
Yes; what else would you do in metaethics? Isn’t its job to point to ethical theories, while the job of ethics is to assume you have agreed on a theory (an often false assumption)?
Ethics is the subject in which you argue about which ethical theory is correct. In meta-ethics, you argue about how you would know if an ethical theory were correct, and/or what it would mean for an ethical theory to be correct, etc.
First, is ethics only about decision procedures? The existence of the concept of moral luck suggests not. Sure, you can say lots of people are wrong, but to banish them from the field of ethics is ridiculous. Virtue ethics is another example, less clearly a counterexample, but much more central.
The three level hierarchy at your link does nothing to tell what belongs in meta-ethics and what belongs in ethics. I don’t think your comment here is consistent with your comment there and I don’t think either comment has much to do with the three level hierarchy.
Meta-ethics is about issues that are logically prior to ethics. I reject your list. If there are disagreements about the logical priority of issues, then there should be disagreements about what constitutes meta-ethics. You could have a convention that meta-ethics is defined as a certain list of topics by tradition, but that’s stupid. In particular, I think consequentialism vs deontology has high logical priority. Maybe you disagree with me, but to say that I am wrong by definition is not helpful.
Going back to Eliezer, I think that he does only cover meta-ethical claims and that they do pin down an ethical theory. Maybe other meta-ethical stances would not uniquely do so (contrary to my previous comment), but his do.
First, is ethics only about decision procedures? The existence of the concept of moral luck suggests not.
It may not surprise you to learn that I am of the school that rejects the concept of moral luck. (In this I think I align with Eliezer.)
Meta-ethics is about issues that are logically prior to ethics
This is unobjectionable provided that one agrees about what ethics consists of. As far as I am aware, standard philosophical terminology labels utilitarianism (for example) as an ethical theory; yet I have seen people on LW refer to “utilitarian meta-ethics”. This is the kind of usage I mean to disapprove of, and I hold Eliezer under suspicion of encouraging it by blurring the distinction in his sequence.
I should be clear about the fact that this is a terminological issue; my interest here is mainly in preserving the integrity of the prefix “meta”, which I think has suffered excessive abuse both here and elsewhere. For whatever reason, Eliezer’s use of the term felt abusive to me.
Part of the problem may be that Eliezer seemed to think the concept of rigid designation was the important issue, as opposed to e.g. the orthogonality thesis, and I found this perplexing (and uncharacteristic of him). Discomfort about this may have contributed to my perception that meta-ethics wasn’t really the topic of his sequence, so that his calling it that was “off”. But this is admittedly distinct from my claim that his thesis is ethical rather than meta-ethical.
Going back to Eliezer, I think that he does only cover meta-ethical claims and that they do pin down an ethical theory. Maybe other meta-ethical stances would not uniquely do so (contrary to my previous comment), but his do.
This is again a terminological point, but I think a sequence should be named after the conclusion rather than the premises. If his meta-ethical stance pins down an ethical theory, he should have called the sequence explaining it his “ethics” sequence; just as if I use my theory of art history to derive my theory of physics, then my sequence explaining it should be my “physics” sequence rather than my “art history” sequence.
You demand that everyone accept your definition of ethics, excluding moral luck from the subject, but you simultaneously demand that meta-ethics be defined by convention.
I said both of those points (but not their conjunction) in my previous comment, after explicitly anticipating what you say here and I’m rather annoyed that you ignored it. I guess the lesson is to say as little as possible.
Now just hold on a second. You are arguing by uncharitable formulation, implying that there is tension between two claims when, logically, there is none. (Forgive me for not assuming you were doing that, and thereby, according to you, “ignoring” your previous comment.) There is nothing contradictory about holding that (1) ethical theories that include moral luck are wrong; and (2) utilitarianism is an ethical theory and not a meta-ethical theory.
(1) is an ethical claim. (2) is the conjunction of a meta-ethical claim (“utilitarianism is an ethical theory”) and a meta-meta-ethical claim (“utilitarianism is not a meta-ethical theory”).
( I hereby declare this comment to supersede all of my previous comments on the subject of the distinction between ethics and meta-ethics, insofar as there is any inconsistency; and in the event there is any inconsistency, I pre-emptively cede you dialectical victory except insofar as doing so would contradict anything else I have said in this comment.)
OK, if you’ve abandoned your claim that I “consequentialism is not a meta-ethical attribute,” is true by convention, then that’s fine. I’ll just disagree with it and keep including consequentialism vs deontology in meta-ethics, just as I’ll keep including moral luck in ethics.
“In philosophy, meta-ethics is the branch of ethics that seeks to understand the nature of ethical properties, statements, attitudes, and judgments. Meta-ethics is one of the three branches of ethics generally recognized by philosophers, the others being normative ethics and applied ethics.
While normative ethics addresses such questions as “What should one do?”, thus endorsing some ethical evaluations and rejecting others, meta-ethics addresses questions such as “What is goodness?” and “How can we tell what is good from what is bad?”, seeking to understand the nature of ethical properties and evaluations.”
I would be surprised if Eliezer believed (1) or (2), as distinct from believing that CEV[X] is the most viably actionable approximation of morality[X] (using your terminology) we’ve come up with thus far.
This reminds me somewhat of the difference between believing that 2013 cryonics technology reliably preserves the information content of a brain on the one hand, and on the other believing that 2013 cryonics technology has a higher chance of preserving the information than burial or cremation.
I agree that that he devotes a lot of time to arguing against (3), though I’ve always understood that as a reaction to the “but a superintelligent system would be smart enough to just figure out how to behave ethically and then do it!” crowd.
I would be surprised if Eliezer believed (1) or (2), as distinct from believing that CEV[X] is the most viably actionable approximation of morality[X] (using your terminology) we’ve come up with thus far.
I didn’t intend to distinguish that finely.
I’m not really sure what you mean by (4).
(4) is intended to mean that if we alter humans to have a different value system tomorrow, we would also be changing what we mean (today) by “morality”. It’s the negation of the assertion that moral terms are rigid designators, and is what Eliezer is arguing against in No License To Be Human.
In particular, one way (the way?) his metaethics might spit up something that looks a lot like moral realism is if there is strong convergence of values upon (human-ish?) agents receiving better information, time enough to work out contradictions in their values, etc. But the “strong convergence of values” thesis hasn’t really been argued, so I remain unclear as to why Eliezer finds it plausible.
I don’t think you’re “confused” about what was meant. I think you understood exactly what was meant, and have identified a real (and, I believe, acknowledged?) problem with the moral realist definition of Good.
The assumption is that “if we knew more, thought faster, were more the people we wished we were, had grown up farther together” then a very large number of humans would converge onto moral agreement.
The assumption is that if you take a culture that practiced, say, human torture and sacrifice, into our economy, and give them the resources to live at a level of luxury similar to what we experience today and all of our knowledge, they would grow more intelligent, more globally aware, and their morality would slowly shift to become more like ours even in the absence of outside pressure. Our morality, however, would not shift to become more like theirs. It seems like an empirical question.
Alternatively, we could bite the bullet and just say that some humans simply end up with alien values that are not “good”,
I don’t think you’re “confused” about what was meant. I think you understood exactly what was meant, and have identified a real (and, I believe, acknowledged?) problem with the moral realist definition of Good.
The assumption is that “if we knew more, thought faster, were more the people we wished we were, had grown up farther together” then a very large number of humans would converge onto moral agreement.
It’s not the assumption that is good or bad, but the quality of argument provided for it.
Alternatively, we could bite the bullet and just say that some humans simply end up with alien values that are not “good”,
Seeing as about 1% of the population are estimated to be psychopaths, not to mention pathological narcissists megalomaniacs etc, it seems hard to argue that there isn’t a large (if statistically insignificant) portion of the population who are natural ethical egoists rather than altruists. You could try to weasel around it like Mr Yudkowski does, saying that they are not “neurologically intact,” except that there is evidence that psychopathy at least is a stable evolutionary strategy rather than a malfunction of normal systems.
I’m usually not one to play the “evil psychopaths” card online, mainly because it’s crass and diminishes the meaning of a useful medical term, but it’s pretty applicable here. What exactly happens to all the psychopaths and people with psychopathic traits when you start extrapolating human values?
Why even stop at psychopaths? There are perfectly neurotypical people with strong desires for revenge-based justice, purity norms that I strongly dislike, etc. I’m not extremely confident that extrapolation will dissolve these values into deeper-order values, although my perception that intelligence in humans does at least seem to be correlated to values similar to mine is comforting in this respect.
Although really, I think this is reaching the point where we have to stop talking in terms of idealized agents with values and start thinking about how these models can be mapped to actual meat brains.
What exactly happens to all the psychopaths and people with psychopathic traits when you start extrapolating human values?
Well, under the shaky assumption that we have the ability to extrapolate in the first place, in practice what happens is that whoever controls the extrapolation sets which values are to be extrapolated, and they have a very strong incentive to put in only their own values.
By definition, no one wants to implement the CEV of humanity more than they want to implement their own CEV. But I would hope that most of the worlds impacted by the various human’s CEVs would be a pretty nice places to live.
By definition, no one wants to implement the CEV of humanity more than they want to implement their own CEV.
That depends. The more interconnected our lives become, the harder it gets to enhance the life of myself or my loved ones through highly localized improvements. Once you get up to a sufficiently high level (vaccination programs are an obvious example), helping yourself and your loved ones is easiest to accomplish by helping everyone all together, because of the ripple effects down to my loved ones’ loved ones thus having an effect on my loved ones, whom I value unto themselves.
Favoring individual volition versus a group volition could be a matter of social-graph connectedness and weighting: it could be that for a sufficiently connected individual with sufficiently strong value-weight placed on social ties, that individual will feel better about sacrificing some personal preferences to admit their connections’ values rather than simply subjecting their own close social connections to their personal volition.
But as far as your preference goes, your EV >= any other CEV. It has to be that way, tautologically. Extrapolated Volition is defined as what you would choose to do in the counter-factual scenario where you have more intelligence, knowledge, etc than you do now.
If you’re totally altruistic, it might be that your EV is the CEV of humanity, but that means that you have no preference, not that you prefer humanity’s CEV over your own. Remember, all your preferences, including the moral and altruistic ones, are included in your EV.
The notion I’m trying to express is not an entirely altruistic EV, or even a deliberately altruistic EV. Simply, this person has friends and family and such, and thus has a partially social EV; this person is at least altruistic towards close associates when it costs them nothing.
My claim, then, is that if we denote the n = number of hops from any one person to any other in the social graph of such agents:
lim_{n->0} Social Component of Personal EV = species-wide CEV
Now, there may be special cases, such as people who don’t give a shit about anyone but themselves, but the idea is that as social connectedness grows, benefitting only myself and my loved ones becomes more and more expensive and unwieldly (for instance, income inequality and guard labor already have sizable, well-studied economic costs, and that’s before we’re talking about potential improvements to the human condition from AI!) compared to just doing things that are good for everyone without regard to people’s connection to myself (they’re bound to connect through a mutual friend or relative with some low degree, after all) or social status (because again, status enforcement is expensive).
So while the total degree to which I care about other people is limited (Social Component of Personal EV ⇐ Personal EV), eventually that component should approximate the CEV of everyone reachable from me in the social graph.
The question, then, becomes whether that Social Component of my Personal EV is large enough to overwhelm some of my own personal preferences (I participate in a broader society voluntarily) or whether my personal values overwhelm my consideration of other people’s feelings (I conquer the world and crush you beneath my feet).
Seems to me that to a significant degree the psychopaths are successful because people around them have problems communicating. Information about what the specific psychopath did to whom are usually not shared. If they were easily accessible to people before interacting with the psychopath, a lot of their power would be lost.
Despite being introverted by nature, these days my heuristics for dealing with problematic people is to establish good communication lines among the non-problematic people. Then people often realize that what seemed like their specific problem is in fact almost everyone’s problem with the same person, following the same pattern. When a former mystery becomes an obvious algorithm, it is easier to think about a counter-strategy.
Sometimes the mentally different person beats you not by using a strategy so complex you wouldn’t understand it, but by using a relatively simple strategy that is so weird to you that you just don’t notice it in the hypothesis space (and instead you imagine something more complex and powerful). But once you have enough data to understand the strategy, sometimes you can find and exploit its flaws.
A specific example of a powerful yet vulnerable strategy is lying strategically to everyone around you and establishing yourself as the only channel of information between different groups of people. Then you can make the group A believe the group B are idiots and vice versa, and make both groups see you as their secret ally. Your strategy can be stable for a long time, because when the groups believe each other to be idiots, they naturally avoid communicating with each other; and when they do, they realize the other side has completely wrong information, which they attribute to the other side’s stupidity, not your strategic lying. -- Yet, if there is a person at each side that becomes suspicious of the manipulator, and if these two people can trust each other enough to meet and share their info (what each of them heard about the other side, and what actually happened), and if they make the result known to their respective groups, then… well, I don’t actually know what happens, because right now I am exactly at this point in my specific undisclosed project… but I hope it can seriously backfire to the manipulator.
Of course, this is just a speculation. If we made communication among non-psychopaths more easy, the psychopaths would also make their next move in the arms race—they could misuse the channels for more powerful attacks, or make people provide incorrect information about them by manipulation or threats. So it’s not obvious that better communication would mean less power for psychopaths. But it seems to me that a lack of communication is always helpful for them, so more communication should generally be helpful. Even having the concept of a psychopath is helpful, although it can be abused. Investigating the specific weaknesses of psychopaths and making them widely known (just like the weaknesses of average people are generally known) could also reduce their advantage.
However, I imagine that the values of psychopaths are not so different from values of average people. They are probably a subset, and the missing parts (such as empathy) are those that cause problems. Let’s say they give extreme priority to feeling superior and watching their enemies crushed and pretty much ignore everything else (a huge simplification). There is a chance their values are so different they could be satisfied in a manner we would consider unfriendly, but they wouldn’t—for example if reality is not valuable for them, why not give them an illusion of maximum superiority, and a happy life to everyone else, so everyone will have their utility function maximized? Maybe they would agree with this solution even if they had perfect intelligence and knowledge.
In particular, one way (the way?) his metaethics might spit up something that looks a lot like moral realism is if there is strong convergence of values upon agents receiving better information, time enough to work out contradictions in their values, etc. But the “strong convergence of values” thesis hasn’t really been argued, so I remain unclear as to why Eliezer finds it plausible.
When you say “agents” here, did you mean to say “psychologically normal humans”? Because the general claim I think Eliezer would reject, based on what he says on No Universally Compelling Arguments. But I do think he would accept the narrower claim about psychologically normal humans, or as he sometimes says “neurologically intact humans.” And the argument for that is found in places like The Psychological Unity of Humankind, though I think there’s an even better link for it somewhere—I seem to distinctly remember a post where he says something about how you should be very careful about attributing moral disagreements to fundamentally different values.
EDIT: Here is the other highly relevant post I was thinking of.
Yeah, I meant to remain ambiguous about how wide Eliezer means to cast the net around agents. Maybe it’s psychologically normal humans, maybe it’s wider or narrower than that.
Some of the sources you are hand waving towards are (quite rightly) pointing out that rational agents need not converge, but they aren’t looking at the empirical question of whether humans, specifically, converge. Only a subset of those sources are actually talking about humans specifically.
(^This isn’t disagreement. I agree with your main suggestion that humans probably don’t converge, although I do think they are at least describable by mono-modal distributions)
I’m not sure it’s even appropriate to use philosophy to answer this question. The philosophical problem here is “how do we apply idealized constructs like extrapolated preference and terminal values to flesh-and-blood animals?” Things like “should values which are not biologically ingrained count as terminal values?” and similar questions.
...and then, once we’ve developed constructs to the point that we’re ready to talk about the extent to which humans specifically converge if at all, it becomes an empirical question..
No Universally Compelling Arguments has been put to me as a decisive refutation of Moral Realism, by somebody who thought the LW line was anti-realist. It isn’t a decisive refutation because no (non strawman) realist thinks there are arguments that could compel an irrational person, an insane person, an very unintelligent person, and so on. Moral realists only need to argue that moral truths are independently discoverable by suitably motivated and equipped people, like mathematical truths (etc).
When you say “agents” here, did you mean to say “psychologically normal humans”? Because the general claim I think Eliezer would reject, based on what he says on No Universally Compelling Arguments.
Well, “No Universally Compelling Arguments” also applies to physics, but it is generally believed that all sufficiently intelligent agents would agree on the laws of physics.
True, but physics is discoverable via the scientific method, and ultimately, in the nastiest possible limit, via war. If we disagree on physics, all we have to do is highlight the disagreement and go to war over it: whichever one of us is closer to right will succeed in killing the other guy (and potentially a hell of a lot of other stuff).
Whereas if you try going to war over morality, everyone winds up dead and you’ve learned nothing, except possibly that almost everyone considers a Hobbesian war-of-all-against-all to be undesirable when it happens to him.
EDIT: Here is the other highly relevant post I was thinking of.
I think what he is talking about there is lack of disagreement in the sense of incommensurability, or orthogonality as it is locally known. Lack of disagreement int he sense of convergence or consensus is a very different thing.
But the “strong convergence of values” thesis hasn’t really been argued, so I remain unclear as to why Eliezer finds it plausible.
Hasn’t been argued and seems quite implausible to me.
I find moral realism meaningful for each individual (you can evaluate choices according to my values applied with infinite information and infinite resources to think), but I don’t find it meaningful when applied to groups of people, all with their own values.
EY finesses the point by talking about an abstract algorithm, and not clearly specifying what that algorithm actually implements, whether my values, yours, or some unspecified amalgamation of the values of different people. So that the point of moral subjectivism vs. moral universalism is left unspecified, to be filled in by the imagination of the reader.To my ear, sometimes it seems one way, and sometimes the other. My guess was that this was intentional, as clarifying the point wouldn’t take much effort. The discussions of EY’s metaethics always strike me as peculiar, as he’s wandering about here somewhere while people discuss how they’re unclear just what conclusion he had drawn.
I find moral realism meaningful for each individual (you can evaluate choices according to my values applied with infinite information and infinite resources to think),
I can how that could be implemented. However, I don’t see how that would count as morality. It amounts to Anything Goes, or Do What Thou Wilt. I don’t see how a world in which that kind of “moral realism” holds would differ from one where moral subjectivism holds, or nihilism for that matter.
but I don’t find it meaningful when applied to groups of people, all with their own values.
Where meaningful means implementable? Moral realism is not many things, and one of the things it is not is the claim
that everyone gets to keep all their values and behaviour unaltered.
Not “anything goes, do what you will”, so much as “all X go, X is such that we want X before we do it, we value doing X while we are doing it, and we retrospectively approve of X after doing it”.
We humans have future-focused, hypothetical-focused, present-focused, and past-focused motivations that don’t always agree. CEV (and, to a great extent, moral rationality as a broader field) is about finding moral reasoning strategies and taking actions such that all those motivational systems will agree that we Did a Good Job.
That said, being able to demonstrate that the set of Coherently Extrapolated Volitions exists is not a construction showing how to find members of that set.
Not “anything goes, do what you will”, so much as “all X go, X is such that we want X before we do it, we value doing X while we are doing it, and we retrospectively approve of X after doing it”.
As with a number of previous responses, that is ambiguous between the individual and the collective. If I could get some utility by killing you, then should I kill you? If the “we” above is interpreted individually, I should: if it is interpreted collectively, I shouldn’t.
Yes, that is generally considered the core open problem of ethics, once you get past things like “how do we define value” and blah blah blah like that. How do I weigh one person’s utility against another person’s? Unless it’s been solved and nobody told me, that’s a Big Question.
It’s a hell of a lot better than nothing, and it’s entirely possible to solve those individual-weighting problems, possibly by looking at the social graph and at how humans affect each other. There ought to be some treatment of the issue that yields a reasonable collective outcome without totally suppressing or overriding individual volitions.
Certainly, the first thing that comes to mind is that some human interactions are positive sum, some negative sum, some zero-sum. If you configure collective volition to always prefer mutually positive-sum outcomes over zero-sum over negative, then it’s possible to start looking for (or creating) situations where sinister choices don’t have to be made.
Requesting lukeprog get round to this. Lesswrong Metaethics, given that it rejects a large amount of rubbish (coherentism being the main part), is the best in the field today and needs further advancing.
Requesting people upvote this post if they agree with me that getting round to metaethics is the best thing Lukeprog could be doing with his time, and downvote if they disagree.
I would love to see Luke (the other Luke, but maybe you, too) and hopefully others (like Yvain) explicate their views on meta-ethics, given how the Eliezer’s Sequence is at best unclear (though quite illuminating). It seems essential because a clear meta-ethics seems necessary to achieve MIRI’s stated purpose: averting AGI x-risk by developing FAI.
Creating a “balance Karma” post. Asking people use this for their conventional Karma for my above post, or to balance out upvotes/downvotes. This way my Karma will remain fair.
I remain confused by Eliezer’s metaethics sequence.
Both there and in By Which It May Be Judged, I see Eliezer successfully arguing that (something like) moral realism is possible in a reductionist universe (I agree), but he also seems to want to say that in fact (something like) moral realism actually obtains, and I don’t understand what the argument for that is. In particular, one way (the way?) his metaethics might spit up something that looks a lot like moral realism is if there is strong convergence of values upon (human-ish?) agents receiving better information, time enough to work out contradictions in their values, etc. But the “strong convergence of values” thesis hasn’t really been argued, so I remain unclear as to why Eliezer finds it plausible.
Basically, I read the metaethics sequence as asserting both things but arguing only for the first.
But I’m not sure about this. Perhaps because I was already familiar with the professional metaethics vocabulary when I read the sequence, I found Eliezer’s vocabulary for talking about positions in metaethics confusing.
I meant to explore these issues in a vocabulary I find more clear, in my own metaethics sequence, but I still haven’t got around to it. :(
(I’m putting this as a reply to your comment because your comment is what made me think of it.)
In my view, Eliezer’s “metaethics” sequence, despite its name, argues for his ethical theory, roughly
(1) morality[humans] = CEV[humans]
(N.B.: this is my terminology; Eliezer would write “morality” where I write “morality[humans]”) without ever arguing for his (implied) metaethical theory, which is something like
(2) for all X, morality[X] = CEV[X].
Worse, much of his effort is spent arguing against propositions like
(3) (1) ⇒ for all X, morality[X] = CEV[humans] (The Bedrock of Morality: Arbitrary?)
and
(4) (1) ⇒ morality[humans] = CEV[“humans”] (No License To Be Human)
which, I feel, are beside the point.
Yes; what else would you do in metaethics?
Isn’t its job to point to ethical theories, while the job of ethics is to assume you have agreed on a theory (an often false assumption)?
Ethics is the subject in which you argue about which ethical theory is correct. In meta-ethics, you argue about how you would know if an ethical theory were correct, and/or what it would mean for an ethical theory to be correct, etc.
See here for a previous comment of mine on this.
First, is ethics only about decision procedures? The existence of the concept of moral luck suggests not. Sure, you can say lots of people are wrong, but to banish them from the field of ethics is ridiculous. Virtue ethics is another example, less clearly a counterexample, but much more central.
The three level hierarchy at your link does nothing to tell what belongs in meta-ethics and what belongs in ethics. I don’t think your comment here is consistent with your comment there and I don’t think either comment has much to do with the three level hierarchy.
Meta-ethics is about issues that are logically prior to ethics. I reject your list. If there are disagreements about the logical priority of issues, then there should be disagreements about what constitutes meta-ethics. You could have a convention that meta-ethics is defined as a certain list of topics by tradition, but that’s stupid. In particular, I think consequentialism vs deontology has high logical priority. Maybe you disagree with me, but to say that I am wrong by definition is not helpful.
Going back to Eliezer, I think that he does only cover meta-ethical claims and that they do pin down an ethical theory. Maybe other meta-ethical stances would not uniquely do so (contrary to my previous comment), but his do.
It may not surprise you to learn that I am of the school that rejects the concept of moral luck. (In this I think I align with Eliezer.)
This is unobjectionable provided that one agrees about what ethics consists of. As far as I am aware, standard philosophical terminology labels utilitarianism (for example) as an ethical theory; yet I have seen people on LW refer to “utilitarian meta-ethics”. This is the kind of usage I mean to disapprove of, and I hold Eliezer under suspicion of encouraging it by blurring the distinction in his sequence.
I should be clear about the fact that this is a terminological issue; my interest here is mainly in preserving the integrity of the prefix “meta”, which I think has suffered excessive abuse both here and elsewhere. For whatever reason, Eliezer’s use of the term felt abusive to me.
Part of the problem may be that Eliezer seemed to think the concept of rigid designation was the important issue, as opposed to e.g. the orthogonality thesis, and I found this perplexing (and uncharacteristic of him). Discomfort about this may have contributed to my perception that meta-ethics wasn’t really the topic of his sequence, so that his calling it that was “off”. But this is admittedly distinct from my claim that his thesis is ethical rather than meta-ethical.
This is again a terminological point, but I think a sequence should be named after the conclusion rather than the premises. If his meta-ethical stance pins down an ethical theory, he should have called the sequence explaining it his “ethics” sequence; just as if I use my theory of art history to derive my theory of physics, then my sequence explaining it should be my “physics” sequence rather than my “art history” sequence.
You demand that everyone accept your definition of ethics, excluding moral luck from the subject, but you simultaneously demand that meta-ethics be defined by convention.
I said both of those points (but not their conjunction) in my previous comment, after explicitly anticipating what you say here and I’m rather annoyed that you ignored it. I guess the lesson is to say as little as possible.
Now just hold on a second. You are arguing by uncharitable formulation, implying that there is tension between two claims when, logically, there is none. (Forgive me for not assuming you were doing that, and thereby, according to you, “ignoring” your previous comment.) There is nothing contradictory about holding that (1) ethical theories that include moral luck are wrong; and (2) utilitarianism is an ethical theory and not a meta-ethical theory.
(1) is an ethical claim. (2) is the conjunction of a meta-ethical claim (“utilitarianism is an ethical theory”) and a meta-meta-ethical claim (“utilitarianism is not a meta-ethical theory”).
( I hereby declare this comment to supersede all of my previous comments on the subject of the distinction between ethics and meta-ethics, insofar as there is any inconsistency; and in the event there is any inconsistency, I pre-emptively cede you dialectical victory except insofar as doing so would contradict anything else I have said in this comment.)
OK, if you’ve abandoned your claim that I “consequentialism is not a meta-ethical attribute,” is true by convention, then that’s fine. I’ll just disagree with it and keep including consequentialism vs deontology in meta-ethics, just as I’ll keep including moral luck in ethics.
“In philosophy, meta-ethics is the branch of ethics that seeks to understand the nature of ethical properties, statements, attitudes, and judgments. Meta-ethics is one of the three branches of ethics generally recognized by philosophers, the others being normative ethics and applied ethics.
While normative ethics addresses such questions as “What should one do?”, thus endorsing some ethical evaluations and rejecting others, meta-ethics addresses questions such as “What is goodness?” and “How can we tell what is good from what is bad?”, seeking to understand the nature of ethical properties and evaluations.”
I would be surprised if Eliezer believed (1) or (2), as distinct from believing that CEV[X] is the most viably actionable approximation of morality[X] (using your terminology) we’ve come up with thus far.
This reminds me somewhat of the difference between believing that 2013 cryonics technology reliably preserves the information content of a brain on the one hand, and on the other believing that 2013 cryonics technology has a higher chance of preserving the information than burial or cremation.
I agree that that he devotes a lot of time to arguing against (3), though I’ve always understood that as a reaction to the “but a superintelligent system would be smart enough to just figure out how to behave ethically and then do it!” crowd.
I’m not really sure what you mean by (4).
I didn’t intend to distinguish that finely.
(4) is intended to mean that if we alter humans to have a different value system tomorrow, we would also be changing what we mean (today) by “morality”. It’s the negation of the assertion that moral terms are rigid designators, and is what Eliezer is arguing against in No License To Be Human.
Ah, gotcha. OK, thanks for clarifying.
I don’t think you’re “confused” about what was meant. I think you understood exactly what was meant, and have identified a real (and, I believe, acknowledged?) problem with the moral realist definition of Good.
The assumption is that “if we knew more, thought faster, were more the people we wished we were, had grown up farther together” then a very large number of humans would converge onto moral agreement.
The assumption is that if you take a culture that practiced, say, human torture and sacrifice, into our economy, and give them the resources to live at a level of luxury similar to what we experience today and all of our knowledge, they would grow more intelligent, more globally aware, and their morality would slowly shift to become more like ours even in the absence of outside pressure. Our morality, however, would not shift to become more like theirs. It seems like an empirical question.
Alternatively, we could bite the bullet and just say that some humans simply end up with alien values that are not “good”,
It’s not the assumption that is good or bad, but the quality of argument provided for it.
Seeing as about 1% of the population are estimated to be psychopaths, not to mention pathological narcissists megalomaniacs etc, it seems hard to argue that there isn’t a large (if statistically insignificant) portion of the population who are natural ethical egoists rather than altruists. You could try to weasel around it like Mr Yudkowski does, saying that they are not “neurologically intact,” except that there is evidence that psychopathy at least is a stable evolutionary strategy rather than a malfunction of normal systems.
I’m usually not one to play the “evil psychopaths” card online, mainly because it’s crass and diminishes the meaning of a useful medical term, but it’s pretty applicable here. What exactly happens to all the psychopaths and people with psychopathic traits when you start extrapolating human values?
Why even stop at psychopaths? There are perfectly neurotypical people with strong desires for revenge-based justice, purity norms that I strongly dislike, etc. I’m not extremely confident that extrapolation will dissolve these values into deeper-order values, although my perception that intelligence in humans does at least seem to be correlated to values similar to mine is comforting in this respect.
Although really, I think this is reaching the point where we have to stop talking in terms of idealized agents with values and start thinking about how these models can be mapped to actual meat brains.
Well, under the shaky assumption that we have the ability to extrapolate in the first place, in practice what happens is that whoever controls the extrapolation sets which values are to be extrapolated, and they have a very strong incentive to put in only their own values.
By definition, no one wants to implement the CEV of humanity more than they want to implement their own CEV. But I would hope that most of the worlds impacted by the various human’s CEVs would be a pretty nice places to live.
That depends. The more interconnected our lives become, the harder it gets to enhance the life of myself or my loved ones through highly localized improvements. Once you get up to a sufficiently high level (vaccination programs are an obvious example), helping yourself and your loved ones is easiest to accomplish by helping everyone all together, because of the ripple effects down to my loved ones’ loved ones thus having an effect on my loved ones, whom I value unto themselves.
Favoring individual volition versus a group volition could be a matter of social-graph connectedness and weighting: it could be that for a sufficiently connected individual with sufficiently strong value-weight placed on social ties, that individual will feel better about sacrificing some personal preferences to admit their connections’ values rather than simply subjecting their own close social connections to their personal volition.
Then they have an altruistic EV. That’s allowed.
But as far as your preference goes, your EV >= any other CEV. It has to be that way, tautologically. Extrapolated Volition is defined as what you would choose to do in the counter-factual scenario where you have more intelligence, knowledge, etc than you do now.
If you’re totally altruistic, it might be that your EV is the CEV of humanity, but that means that you have no preference, not that you prefer humanity’s CEV over your own. Remember, all your preferences, including the moral and altruistic ones, are included in your EV.
Sorry, I don’t think I’m being clear.
The notion I’m trying to express is not an entirely altruistic EV, or even a deliberately altruistic EV. Simply, this person has friends and family and such, and thus has a partially social EV; this person is at least altruistic towards close associates when it costs them nothing.
My claim, then, is that if we denote the n = number of hops from any one person to any other in the social graph of such agents:
lim_{n->0} Social Component of Personal EV = species-wide CEV
Now, there may be special cases, such as people who don’t give a shit about anyone but themselves, but the idea is that as social connectedness grows, benefitting only myself and my loved ones becomes more and more expensive and unwieldly (for instance, income inequality and guard labor already have sizable, well-studied economic costs, and that’s before we’re talking about potential improvements to the human condition from AI!) compared to just doing things that are good for everyone without regard to people’s connection to myself (they’re bound to connect through a mutual friend or relative with some low degree, after all) or social status (because again, status enforcement is expensive).
So while the total degree to which I care about other people is limited (Social Component of Personal EV ⇐ Personal EV), eventually that component should approximate the CEV of everyone reachable from me in the social graph.
The question, then, becomes whether that Social Component of my Personal EV is large enough to overwhelm some of my own personal preferences (I participate in a broader society voluntarily) or whether my personal values overwhelm my consideration of other people’s feelings (I conquer the world and crush you beneath my feet).
Seems to me that to a significant degree the psychopaths are successful because people around them have problems communicating. Information about what the specific psychopath did to whom are usually not shared. If they were easily accessible to people before interacting with the psychopath, a lot of their power would be lost.
Despite being introverted by nature, these days my heuristics for dealing with problematic people is to establish good communication lines among the non-problematic people. Then people often realize that what seemed like their specific problem is in fact almost everyone’s problem with the same person, following the same pattern. When a former mystery becomes an obvious algorithm, it is easier to think about a counter-strategy.
Sometimes the mentally different person beats you not by using a strategy so complex you wouldn’t understand it, but by using a relatively simple strategy that is so weird to you that you just don’t notice it in the hypothesis space (and instead you imagine something more complex and powerful). But once you have enough data to understand the strategy, sometimes you can find and exploit its flaws.
A specific example of a powerful yet vulnerable strategy is lying strategically to everyone around you and establishing yourself as the only channel of information between different groups of people. Then you can make the group A believe the group B are idiots and vice versa, and make both groups see you as their secret ally. Your strategy can be stable for a long time, because when the groups believe each other to be idiots, they naturally avoid communicating with each other; and when they do, they realize the other side has completely wrong information, which they attribute to the other side’s stupidity, not your strategic lying. -- Yet, if there is a person at each side that becomes suspicious of the manipulator, and if these two people can trust each other enough to meet and share their info (what each of them heard about the other side, and what actually happened), and if they make the result known to their respective groups, then… well, I don’t actually know what happens, because right now I am exactly at this point in my specific undisclosed project… but I hope it can seriously backfire to the manipulator.
Of course, this is just a speculation. If we made communication among non-psychopaths more easy, the psychopaths would also make their next move in the arms race—they could misuse the channels for more powerful attacks, or make people provide incorrect information about them by manipulation or threats. So it’s not obvious that better communication would mean less power for psychopaths. But it seems to me that a lack of communication is always helpful for them, so more communication should generally be helpful. Even having the concept of a psychopath is helpful, although it can be abused. Investigating the specific weaknesses of psychopaths and making them widely known (just like the weaknesses of average people are generally known) could also reduce their advantage.
However, I imagine that the values of psychopaths are not so different from values of average people. They are probably a subset, and the missing parts (such as empathy) are those that cause problems. Let’s say they give extreme priority to feeling superior and watching their enemies crushed and pretty much ignore everything else (a huge simplification). There is a chance their values are so different they could be satisfied in a manner we would consider unfriendly, but they wouldn’t—for example if reality is not valuable for them, why not give them an illusion of maximum superiority, and a happy life to everyone else, so everyone will have their utility function maximized? Maybe they would agree with this solution even if they had perfect intelligence and knowledge.
The wirehead solution applies to a lot more than psychopaths. Why would you consider it unfriendly?
When you say “agents” here, did you mean to say “psychologically normal humans”? Because the general claim I think Eliezer would reject, based on what he says on No Universally Compelling Arguments. But I do think he would accept the narrower claim about psychologically normal humans, or as he sometimes says “neurologically intact humans.” And the argument for that is found in places like The Psychological Unity of Humankind, though I think there’s an even better link for it somewhere—I seem to distinctly remember a post where he says something about how you should be very careful about attributing moral disagreements to fundamentally different values.
EDIT: Here is the other highly relevant post I was thinking of.
Yeah, I meant to remain ambiguous about how wide Eliezer means to cast the net around agents. Maybe it’s psychologically normal humans, maybe it’s wider or narrower than that.
I suppose ‘The psychological unity of humankind’ is sort of an argument that value convergence is likely at least among humans, though it’s more like a hand-wave. In response, I’d hand-wave toward Sobel (1999); Prinz (2007); Doring & Steinhoff (2009); Doring & Andersen (2009); Robinson (2009); Sotala (2010); Plunkett (2010); Plakias (2011); Egan (2012), all of which argue for pessimism about value convergence. Smith (1994) is the only philosophical work I know of that argues for optimism about value convergence, but there are probably others I just don’t know about.
Some of the sources you are hand waving towards are (quite rightly) pointing out that rational agents need not converge, but they aren’t looking at the empirical question of whether humans, specifically, converge. Only a subset of those sources are actually talking about humans specifically.
(^This isn’t disagreement. I agree with your main suggestion that humans probably don’t converge, although I do think they are at least describable by mono-modal distributions)
I’m not sure it’s even appropriate to use philosophy to answer this question. The philosophical problem here is “how do we apply idealized constructs like extrapolated preference and terminal values to flesh-and-blood animals?” Things like “should values which are not biologically ingrained count as terminal values?” and similar questions.
...and then, once we’ve developed constructs to the point that we’re ready to talk about the extent to which humans specifically converge if at all, it becomes an empirical question..
No Universally Compelling Arguments has been put to me as a decisive refutation of Moral Realism, by somebody who thought the LW line was anti-realist. It isn’t a decisive refutation because no (non strawman) realist thinks there are arguments that could compel an irrational person, an insane person, an very unintelligent person, and so on. Moral realists only need to argue that moral truths are independently discoverable by suitably motivated and equipped people, like mathematical truths (etc).
Well, “No Universally Compelling Arguments” also applies to physics, but it is generally believed that all sufficiently intelligent agents would agree on the laws of physics.
True, but physics is discoverable via the scientific method, and ultimately, in the nastiest possible limit, via war. If we disagree on physics, all we have to do is highlight the disagreement and go to war over it: whichever one of us is closer to right will succeed in killing the other guy (and potentially a hell of a lot of other stuff).
Whereas if you try going to war over morality, everyone winds up dead and you’ve learned nothing, except possibly that almost everyone considers a Hobbesian war-of-all-against-all to be undesirable when it happens to him.
I think what he is talking about there is lack of disagreement in the sense of incommensurability, or orthogonality as it is locally known. Lack of disagreement int he sense of convergence or consensus is a very different thing.
Hasn’t been argued and seems quite implausible to me.
I find moral realism meaningful for each individual (you can evaluate choices according to my values applied with infinite information and infinite resources to think), but I don’t find it meaningful when applied to groups of people, all with their own values.
EY finesses the point by talking about an abstract algorithm, and not clearly specifying what that algorithm actually implements, whether my values, yours, or some unspecified amalgamation of the values of different people. So that the point of moral subjectivism vs. moral universalism is left unspecified, to be filled in by the imagination of the reader.To my ear, sometimes it seems one way, and sometimes the other. My guess was that this was intentional, as clarifying the point wouldn’t take much effort. The discussions of EY’s metaethics always strike me as peculiar, as he’s wandering about here somewhere while people discuss how they’re unclear just what conclusion he had drawn.
I can how that could be implemented. However, I don’t see how that would count as morality. It amounts to Anything Goes, or Do What Thou Wilt. I don’t see how a world in which that kind of “moral realism” holds would differ from one where moral subjectivism holds, or nihilism for that matter.
Where meaningful means implementable? Moral realism is not many things, and one of the things it is not is the claim that everyone gets to keep all their values and behaviour unaltered.
See my previous coment on “Real Magic”: http://lesswrong.com/lw/tv/excluding_the_supernatural/79ng
If you choose not to count the actual moralities that people have as morality, that’s up to you.
Not “anything goes, do what you will”, so much as “all X go, X is such that we want X before we do it, we value doing X while we are doing it, and we retrospectively approve of X after doing it”.
We humans have future-focused, hypothetical-focused, present-focused, and past-focused motivations that don’t always agree. CEV (and, to a great extent, moral rationality as a broader field) is about finding moral reasoning strategies and taking actions such that all those motivational systems will agree that we Did a Good Job.
That said, being able to demonstrate that the set of Coherently Extrapolated Volitions exists is not a construction showing how to find members of that set.
As with a number of previous responses, that is ambiguous between the individual and the collective. If I could get some utility by killing you, then should I kill you? If the “we” above is interpreted individually, I should: if it is interpreted collectively, I shouldn’t.
Yes, that is generally considered the core open problem of ethics, once you get past things like “how do we define value” and blah blah blah like that. How do I weigh one person’s utility against another person’s? Unless it’s been solved and nobody told me, that’s a Big Question.
So...what’s the point of CEV, hten?
It’s a hell of a lot better than nothing, and it’s entirely possible to solve those individual-weighting problems, possibly by looking at the social graph and at how humans affect each other. There ought to be some treatment of the issue that yields a reasonable collective outcome without totally suppressing or overriding individual volitions.
Certainly, the first thing that comes to mind is that some human interactions are positive sum, some negative sum, some zero-sum. If you configure collective volition to always prefer mutually positive-sum outcomes over zero-sum over negative, then it’s possible to start looking for (or creating) situations where sinister choices don’t have to be made.
Who said the alternative is nothing? Theres any number of theories of morality, and a further number of theories of friendly .ai.
Requesting lukeprog get round to this. Lesswrong Metaethics, given that it rejects a large amount of rubbish (coherentism being the main part), is the best in the field today and needs further advancing.
Requesting people upvote this post if they agree with me that getting round to metaethics is the best thing Lukeprog could be doing with his time, and downvote if they disagree.
Getting round to metaethics should rank on Lukeprog’s priorities: [pollid:573]
I would love to see Luke (the other Luke, but maybe you, too) and hopefully others (like Yvain) explicate their views on meta-ethics, given how the Eliezer’s Sequence is at best unclear (though quite illuminating). It seems essential because a clear meta-ethics seems necessary to achieve MIRI’s stated purpose: averting AGI x-risk by developing FAI.
Creating a “balance Karma” post. Asking people use this for their conventional Karma for my above post, or to balance out upvotes/downvotes. This way my Karma will remain fair.