The goal should be to cause the future to be great on its own terms
What the heck is this supposed to mean? Great according to the Inherent Essence Of Goodness that lives inside futures, rather than as part of human evaluations? Because I’ve got bad news for that plan.
Honestly, I’m disappointed by this post.
You say you’ve found yourself making this argument a lot recently. That’s fair. I think it’s totally reasonable that there are some situations where this argument could move people in the right direction—maybe the audience is considering defecting about aligning AI with humanity but would respond to orders from authority. Or maybe they’re outsiders who think you are going to defect, and you want to signal to them how you’re going to cooperate not just because it’s a good idea, but because it’s an important moral principle to you (as evolution intended).
But this is not an argument that you should just throw out scattershot. Because it’s literally false. There is no single attractor that all human values can be expected to fall into upon reflection. The primary advantage of AI alignment over typical philosophy is that when alignment researchers realize some part of what they were previously calling “alignment” is impossible, they can back up and change how they’re cashing out “alignment” so that it’s actually possible—philosophers have to keep caring about the impossible thing. This advantage goes away if we don’t use it.
Yes, plenty of people liked this post. But I’m holding you to a high standard. Somewhere people should be expected to not keep talking about the impossible thing. Somewhere, there is a version of this post that talks about or directly references:
Game-theoretic arguments for cooperation.
Why game-theoretic arguments are insufficient for egalitarianism (e.g. overly weighting the preferences of the powerful) but still mean that AI should be designed with more than just you in mind, even before accounting for a human preference for an egalitarian future.
Why egalitarianism is a beautiful moral principle that you endorse.
“Wait, wasn’t that this post?” you might say. Kind of! Making a plain ethical/aesthetic argument is like a magic trick where the magician tells you up front that it’s an illusion. This post is much the same magic trick, but the magician is telling you it’s real magic.
Realistic expectations for what egalitarianism can look like in the real world.
It cannot look like finding the one attractor that all human values converge to upon reflection because there is no one attractor that all human values converge to upon reflection.
Perhaps an analysis of how big the “fingerprints” of the creators of the AI are in such situations—e.g. by setting the meta-level standards for what counts as a “human value”.
There is a non-zero chance that the meta-preferences, that end up in charge of the preferences, that end up in charge of the galaxy will come from Mechanical Turkers.
“The goal should be to cause the future to be great on its own terms”
What the heck is this supposed to mean? Great according to the Inherent Essence Of Goodness that lives inside futures, rather than as part of human evaluations?
The rest of the quote explains what this means:
The goal should be to cause the future to be great on its own terms, without locking in the particular moral opinions of humanity today — and without locking in the moral opinions of any subset of humans, whether that’s a corporation, a government, or a nation.
(If you can’t see why a single modern society locking in their current values would be a tragedy of enormous proportions, imagine an ancient civilization such as the Romans locking in their specific morals 2000 years ago. Moral progress is real, and important.)
The present is “good on its own terms”, rather than “good on Ancient Romans’ terms”, because the Ancient Romans weren’t able to lock in their values. If you think this makes sense (and is a good thing) in the absence of an Inherent Essence Of Goodness, then there’s no reason to posit an Inherent Essence Of Goodness when we switch from discussing “moral progress after Ancient Rome” to “moral progress after circa-2022 civilization”.
Could you be explicit about what argument you’re making here? Is it something like:
Even when two variables are strongly correlated, the most extreme value of one will rarely be the most extreme value of the other; therefore it’s <50% likely that different individuals’ CEVs will yield remotely similar results? (E.g., similar enough that one individual will consider the output of most other individuals’ CEVs morally acceptable?)
Or?:
The optimal world-state according to Catholicism is totally different from the optimal world-state according to hedonic utilitarianism; therefore it’s <50% likely that the CEV of a random Catholic will consider the output of a hedonic utilitarian’s CEV morally acceptable. (And vice versa.)
Regarding the second argument: I don’t think that Catholicism is stable under reflection (because it’s false, and a mind needs to avoid thinking various low-complexity true thoughts in order to continue believing Catholicism), so I don’t think the Catholic and hedonic utilitarian’s CEVs will end up disagreeing, even though the optimum for Catholicism and for hedonic utilitarianism disagree.
(I’d bet against hedonic utilitarianism being true as well, but this is obviously a much more open question. And fortunately, CEV-ish buck-passing processes make it less necessary for anyone to take risky bets like that; we can just investigate what’s true and base our decisions on what we learn.)
Catholicism is a relatively easy case, and I expect plenty of disagreement about exactly how much moral disagreement looks like the Catholicism/secularism debate. I expect a lot of convergence on questions like “involuntarily enslaving people: good or bad?”, on the whole, and less on questions like “which do you want more of: chocolate ice cream, or vanilla ice cream?”. But it’s the former questions that matter more for CEV; the latter sorts of questions are ones where we can just let individuals choose different lives for themselves.
“Correlations tend to break when you push things to extremes” is a factor that should increase our expectation of how many things people are likely to morally disagree about. Factors pushing in the other direction include ‘not all correlations work that way’ and evidence that human morality doesn’t work that way.
E.g., ‘human brains are very similar’, ‘empirically, people have converged a lot on morality even though we’ve been pushed toward extremes relative to our EAA’, ‘we can use negotiation and trade to build value systems that are good compromises between two conflicting value systems’, etc.
Also ‘the universe is big, and people’s “amoral” preferences tend to be about how their own life goes, not about the overall distribution of matter in the universe’; so values conflicts tend to be concentrated in cases where we can just let different present-day stakeholders live different sorts of lives, given the universe’s absurd abundance of resources.
Nate said “it shouldn’t matter who does the job, because there should be a path-independent attractor-well that isn’t about making one person dictator-for-life or tiling a particular flag across the universe forever”, and you said this is “literally false”. I don’t see what’s false about it, so if the above doesn’t clarify anything, maybe you can point to the parts of the Arbital article on CEV you disagree with (https://arbital.com/p/cev/)? E.g., I don’t see Nate or Eliezer claiming that people will agree about vanilla vs. chocolate.
Game-theoretic arguments for cooperation [...] mean that AI should be designed with more than just you in mind, even before accounting for a human preference for an egalitarian future
Footnote 2 says that Nate isn’t “stupid enough to toss coordination out the window in the middle of a catastrophic emergency with human existence at stake”. If that isn’t an argument ‘cooperation is useful, therefore we should take others’ preferences into account’, then what sort of argument do you have in mind?
Why egalitarianism is a beautiful moral principle that you endorse.
I don’t know what you mean by “egalitarianism”, or for that matter what you mean by “why”. Are you asking for an ode to egalitarianism? Or an argument for it, in terms of more basic values?
The present is “good on its own terms”, rather than “good on Ancient Romans’ terms”, because the Ancient Romans weren’t able to lock in their values. If you think this makes sense (and is a good thing) in the absence of an Inherent Essence Of Goodness, then there’s no reason to posit an Inherent Essence Of Goodness when we switch from discussing “moral progress after Ancient Rome” to “moral progress after circa-2022 civilization”.
The present is certainly good on my terms (relative to ancient Rome). But the present itself doesn’t care. It’s not the type of thing that can care. So what are you trying to pack inside that phrase, “its own terms”?
If you mean it to sum up a meta-preference you hold about how moral evolution should proceed, then that’s fine. But is that really all? Or are you going to go reason as if there’s some objective essence of what the present’s “own terms” are—e.g. by trying to apply standards of epistemic uncertainty to the state of this essence?
Could you be explicit about what argument you’re making here? Is it something like:
Even when two variables are strongly correlated, the most extreme value of one will rarely be the most extreme value of the other; therefore it’s <50% likely that different individuals’ CEVs will yield remotely similar results? (E.g., similar enough that one individual will consider the output of most other individuals’ CEVs morally acceptable?)
Or?:
The optimal world-state according to Catholicism is totally different from the optimal world-state according to hedonic utilitarianism; therefore it’s <50% likely that the CEV of a random Catholic will consider the output of a hedonic utilitarian’s CEV morally acceptable. (And vice versa.)
I’ll start by quoting the part of Scott’s essay that I was particularly thinking of, to clarify:
Our innate moral classifier has been trained on the Balboa Park – West Oakland route. Some of us think morality means “follow the Red Line”, and others think “follow the Green Line”, but it doesn’t matter, because we all agree on the same route.
When people talk about how we should arrange the world after the Singularity when we’re all omnipotent, suddenly we’re way past West Oakland, and everyone’s moral intuitions hopelessly diverge.
But it’s even worse than that, because even within myself, my moral intuitions are something like “Do the thing which follows the Red Line, and the Green Line, and the Yellow Line…you know, that thing!” And so when I’m faced with something that perfectly follows the Red Line, but goes the opposite directions as the Green Line, it seems repugnant even to me, as does the opposite tactic of following the Green Line. As long as creating and destroying people is hard, utilitarianism works fine, but make it easier, and suddenly your Standard Utilitarian Path diverges into Pronatal Total Utilitarianism vs. Antinatalist Utilitarianism and they both seem awful. If our degree of moral repugnance is the degree to which we’re violating our moral principles, and my moral principle is “Follow both the Red Line and the Green Line”, then after passing West Oakland I either have to end up in Richmond (and feel awful because of how distant I am from Green), or in Warm Springs (and feel awful because of how distant I am from Red).
Okay, so.
What’s the claim I’m projecting onto Nate, that I’m saying is false? It’s something like: “The goal should be to avoid locking in any particular morals. We can do this by passing control to some neutral procedure that allows values to evolve.”
And what I am saying is something like: There is no neutral procedure. There is no way to avoid privileging some morals. This is not a big problem, it’s just how it is, and we can be okay with it.
Related and repetitive statements:
When extrapolating the shared train line past West Oakland, there are multiple ways to continue, but none of them are “the neutral way to do the extrapolation.”
The self-reflection function has many attractors for almost all humans, groups, societies, and AGI architectures. Different starting points might land us in different attractors, and there is no unique “neutral starting point.”
There are many procedures for allowing values to evolve, most of them suck, and picking a good one is an action that will bear the fingerprints of our own values. And that’s fine!
Human meta-preferences, the standards by which we judge what preference extrapolation schemes are good, are preferences. We do not have any mysterious non-preference standards for doing value aggregation and extrapolation.
There is not just one CEV that is the neutral way to do preference aggregation and extrapolation. There are lots of choices that we have to / get to make.
So as you can see, I wasn’t really thinking about differences between “the CEV” of different people—my focus was more on differences between ways of implementing CEV of the same people. A lot of these ways are going to be more or less as good—like comparing your favorite beef stew vs. a 30-course modernist meal. But not all possible implementations of CEV are good, for example you could screw up by modeling exposing people to extreme or highly-optimized stimuli when extrapolating them, leading to the AI causing large changes in the human condition that we wouldn’t presently endorse.
I don’t know what you mean by “egalitarianism”, or for that matter what you mean by “why”. Are you asking for an ode to egalitarianism? Or an argument for it, in terms of more basic values?
By egalitarianism I mean building an AI that tries to help all people, and be responsive to the perspectives of all people, not just a select few. And yes, definitely an ode :D
e.g. by trying to apply standards of epistemic uncertainty to the state of this essence?
I would say that there’s a logical object that a large chunk of human moral discourse is trying to point at — something like “the rules of the logical game Morality”, analogous to “the rules of the logical game Chess”. Two people can both be discussing the same logical object “the rules of Chess”, but have different beliefs about what that logical object’s properties are. And just as someone can be mistaken or uncertain bout the rules of chess — or about their interaction in a specific case — someone can be uncertain about morality.
Do you disagree with any of that?
And what I am saying is something like: There is no neutral procedure. There is no way to avoid privileging some morals. This is not a big problem, it’s just how it is, and we can be okay with it.
“Even the terms in CEV, like ‘know more’ or ‘extrapolate a human’, seem complicated and value-laden.”
If the thing you’re saying is that CEV is itself a complicated idea, and it seems hard for humanity to implement such a thing without already having a pretty deep understanding of human values, then I agree. This seems like an important practical challenge for pulling off CEV: you need to somehow start the bootstrapping process, even though our current understanding of human values is insufficient for formally specifying the best way to do CEV.
If instead you just mean to say “there’s no reason to favor human values over termite values unless you already care about humans”, then yeah, that seems even more obvious to me. If you think Nate is trying to argue for human morality from a humanity-indifferent, View-From-Nowhere perspective, then you’re definitely misunderstanding Nate’s perspective.
When extrapolating the shared train line past West Oakland, there are multiple ways to continue, but none of them are “the neutral way to do the extrapolation.”
If “neutral” here means “non-value-laden”, then sure. If “neutral” here means “non-arbitrary, from a human POV”, then it seems like an open empirical question how many arbitrary decisions like this are required in order to do CEV.
I’d guess that there are few or no arbitrary decisions involved in using CEV to answer high-takes high-stakes moral questions.
There are many procedures for allowing values to evolve, most of them suck, and picking a good one is an action that will bear the fingerprints of our own values.
This makes me think that you misunderstood Nate’s essay entirely. The idea of “don’t leave your fingerprints on the future” isn’t “try to produce a future that has no basis in human values”. The idea is “try to produce a future that doesn’t privilege the AGI operator’s current values at the expense of other humans’ values, the values humans would develop in the future if their moral understanding improved, etc.”.
If you deploy AGI and execute a pivotal act, don’t leave your personal fingerprints all over the long-term future of humanity, in a way that distinguishes you from other humans.
I would say that there’s a logical object that a large chunk of human moral discourse is trying to point at — something like “the rules of the logical game Morality”, analogous to “the rules of the logical game Chess”. Two people can both be discussing the same logical object “the rules of Chess”, but have different beliefs about what that logical object’s properties are. And just as someone can be mistaken or uncertain bout the rules of chess — or about their interaction in a specific case — someone can be uncertain about morality.
When I think about the rules of chess, I basically treat them as having some external essence that I have epistemic uncertainty about. What this means mechanistically is:
When I’m unsure about the rules of chess, this raises the value of certain information-gathering actions, like checking the FIDE website, asking a friend, reading a book.
If I knew the outcomes of all those actions, that would resolve my uncertainty.
I have probabilities associated with my uncertainty, and updates to those probabilities based on evidence should follow Bayesian logic.
Decision-making under uncertainty should linearly aggregate the different possibilities that I’m uncertain over, weighted by their probability.
So the rules of chess are basically just a pattern out in the world that I can go look at. When I say I’m uncertain about the rules of chess, this is epistemic uncertainty that I manage the same as if I’m uncertain about anything else out there in the world.
The “rules of Morality” are not like this.
When I’m unsure about whether I care about fish suffering, this does raise the value of certain information-gathering actions like learning more about fish.
But if I knew the outcomes of all those actions, this wouldn’t resolve all my uncertainty.
I can put probabilities to various possibilities, and can update them on evidence using Bayesian logic—that part still works.
Decision-making under the remaining-after-evidence part of the uncertainty doesn’t have to look like linear aggregation. In fact it shouldn’t—I have meta-preferences like “conservatism,” which says that I should trust models differently depending on whether they seem to be inside their domain of validity or not.
So there’s a lot of my uncertainty about morality that doesn’t stem from being unaware about facts. Where does it come from? One source is self-modeling uncertainty—how do I take the empirical facts about me and the world, and use that to construct a model of myself in which I have preferences, so that I can reflect on my own preferences? There are multiple ways to do this.
So if, and I’m really not sure, but if you were thinking of everything as like uncertainty about the rules of chess, then I would expect two main mistakes: expecting there to be some procedure that takes in evidence and spits out the one right answer, and expecting aggregating over models for decision-making to look like linear aggregation.
“There are many procedures for allowing values to evolve, most of them suck, and picking a good one is an action that will bear the fingerprints of our own values.”
This makes me think that you misunderstood Nate’s essay entirely. The idea of “don’t leave your fingerprints on the future” isn’t “try to produce a future that has no basis in human values”. The idea is “try to produce a future that doesn’t privilege the AGI operator’s current values at the expense of other humans’ values, the values humans would develop in the future if their moral understanding improved, etc.”.
If you deploy AGI and execute a pivotal act, don’t leave your personal fingerprints all over the long-term future of humanity, in a way that distinguishes you from other humans.
Well, maybe I misunderstood. But I’m not really accusing y’all of saying “try to produce a future that has no basis in human values.” I am accusing this post of saying “there’s some neutral procedure for figuring out human values, we should use that rather than a non-neutral procedure.”
So the rules of chess are basically just a pattern out in the world that I can go look at. When I say I’m uncertain about the rules of chess, this is epistemic uncertainty that I manage the same as if I’m uncertain about anything else out there in the world.
The “rules of Morality” are not like this.
This and earlier comments are bald rejections of moral realism (including, maybe especially, naturalist realism). Can I get some evidence for this confident rejection?
I’m not sure what linking Yudkowsky’s (sketch of a) semantics for moral terms is meant to tell us. Case in point, Cornell Realists adopt a similar relativism in their semantics (“good” like “healthy” can only be judged relative to the type of creature you are), but (some of them anyway) will still argue that we can simply discover what is goodthrough a more or less standard scientific process. In other words, they do believe there is a basin of attraction for human values and there is a neutral process for finding it. (It’s only a basin of attraction of course insofar as this process will find it and should we trust that process, we will gravitate toward that basin). To be clear, few if any claim there is one exact lowest point in this basin – there will be many constellations of goods in a life that are equally worthwhile in some sense (first gloss: in the sense that we would be indifferent to a choice between those lives, from behind a veil of ignorance that only assumes we are homo sapiens).
In any case, every major view in meta-ethics has a developed semantics for moral terms: you’ll have to say more about why e.g. your favored semantics is a point in favor of your meta-ethical view. You don’t need to start from scratch of course: philosophers have been working on this for decades (and continue to). Ayer’s classic emotivism (a.k.a. expressivism) ran into the problem of embedding (if moral statement P doesn’t have a truth value, how am I supposed to evaluate statements with P embedded in them, like “P → Q”? Our nice truth tables get thrown out the window...).In response several anti-realists have made proposals, e.g. Blackburn’s quasi-realism. More recently, those responses have come under fire for struggling to hold onto their distinct semantics (or metaphysics or epistemology) while also holding onto their having a distinct view from realism. There is always Error Theory of course but then you’re committed to saying things like “It is false that arbitrary torture is bad. It is also false that arbitrary torture is good.”
If none of this discussion on meta-ethics is your thing, that’s fine, but then you might want to dampen your certainty? Consider deferring to those who have thought longer on this – update on the distribution of philosophers’ views on meta-ethics, modulo whatever selection effects you think are biasing that distribution in a particular direction?
To all appearances, we’re an evolved species on an otherwise fairly unremarkable planet in a universe that doesn’t have any special rules for us.
The causal history of us talking about morality as a species runs through evolution and culture.
We learn to build models of the world, and can use language to communicate about parts of these models. Sometimes it is relevant that the map is not the territory, and the elements of our discourse are things on maps.
In terms of semantics of moral language, I think the people who have to argue about whether they’re realists or anti-realists are doing a fine job. Having fancy semantics that differentiate you from everyone else was a mistake. Good models of moral language should be able to reproduce the semantics that normal people use every day.
E.g. “It’s true that in baseball, you’re out after three strikes.” is not a sentence that needs deep revision after considering that baseball is an invented, contingent game.
In terms of epistemology of morality, the average philosopher has completely dropped the ball. But since, on average, they think that as well, surely I’m only deferring to those who have thought longer on this when I say that.
Good models of moral language should be able to reproduce the semantics that normal people use every day.
Agreed. So much the worse for classic emotivism and error theory.
But semantics seems secondary to you (along with many meta-ethicists frankly – semantic ascent is often just used as a technique for avoiding talking past one another, allowing e.g. anti-realist views to be voiced without begging the question. I think many are happy grab whatever machinery from symbolic logic they need to make the semantics fit the metaphysical/epistemological views they hold more dearly.) I’d like to get clear just what it is you have strong/weak credence in. How would you distribute your credences over the following (very non-exhaustive and simplified) list?
Classic Cultural Relativism: moral rules/rightness are to be identified with cultural codes (and for simplicity, say that goodness is derivative). Implication for moral epistemology: like other invented social games, to determine what is morally right (according to the morality game) we just need to probe the rulemakers/keepers (perhaps society at large or a specific moral authority).
Boyd’s view (example of naturalist realism): moral goodness is to be identified with the homeostatic clusters of natural (read regular, empirically observable) properties that govern the (moral) use of the term “good” in basically the same way that tigerness is to be identified with homeostatic clusters of natural properties that govern the (zoological) use of the term “tiger.” To score highly on tigerness is to score highly on various traits e.g. having orange fur with black strikes, being quadrupedal, being a carnivore, having retractable claws… We’ve learned more about tigers (tigerness) as we encountered more examples (and counterexamples) of them and refined our observation methods/tools; the same goes (will continue to go) for goodness and good people. Implication for moral epistemology: “goodness” has a certain causal profile – investigate what regulates that causal profile, the same we investigate anything else in science. No doubt mind-dependent things like your own preferences or cultural codes will figure among the things that regulate the term “good” but these will rarely have the final say in determining what is good or not. Cultural codes and preferences will likely just figure as one homeostatic mechanism among many.
Blackburn’s Projectivism or Gibbard’s Norm-Expressivism (sophisticated versions of expressivism, examples of naturalist anti-realism): morality is reduced to attitudes/preferences/plans.
According to Blackburn we talk as if moral properties are out their to be investigated the way Boyd suggests we can, but strictly speaking this is false: his view is a form of moral fictionalism. He believes there is no general causal profile to moral terms: nothing besides our preferences/attitudes regulates our usage of these terms. The only thing to “discover” is what our deepest preferences/attitudes are (and if we don’t care about having coherent preferences/attitudes, we can also note our incoherencies). Implication for moral epistemology: learn about the world while also looking deep inside yourself to see how you are moved by that new knowledge (or something to this effect).
According to Gibbard normative statements are expressions of plans – “what to do.” The logical structure of these expressions helps us express, probe and revise our plans for their consistency within a system of plans, but ultimately, no one/nothing outside of yourself can tell you what system of plans to adopt. Implication for moral epistemology: determine what your ultimate plans are and do moral reasoning with others to work out any inconsistencies in your system of plans.
If I had to guess you’re in the vicinity of Blackburn (3.a). Can you confirm? But now, how does your preferred view fit your three bullet points of data better than the others? Your 4th data point, matching normal moral discourse (more like a dataset), is another story. E.g. I think (1) pretty clearly scores worse on this one compared to the others. But the others are debatable, which is part of my point – it’s not obvious which theory to prefer. And there is clearly disagreement between these views – we can’t hold them all at once without some kind of incoherence: there is a choice to be made. How are you making that choice?
As for this:
In terms of epistemology of morality, the average philosopher has completely dropped the ball. But since, on average, they think that as well, surely I’m only deferring to those who have thought longer on this when I say that.
I’m sorry but I don’t follow. Care to elaborate? You’re saying philosophers have, on average, failed to develop plausible/practical moral epistemologies? Are you saying this somehow implies you can safely disregard their views on meta-ethics? I don’t see how: the more relevant question seems to be what our current best methodology for meta-ethics is and whether you or some demographic (e.g. philosophers) are comparatively better at applying it. Coming up with a plausible/practical moral epistemology is often treated as a goal of meta-ethics. Of course the criteria for success in that endeavor will depend what you think the goals of philosophy or science are.
If I had to guess you’re in the vicinity of Blackburn (5.a). Can you confirm?
Can confirm. Although between Boyd and Blackburn, I’d point out that the question of realism falls by the wayside (they both seem to agree we’re modeling the world and then pointing at some pattern we’ve noticed in the world, whether you call that realism or not is murky), and the actionable points of disagreement are things like “how much should we be willing to let complicated intuitions be overruled by simple intuitions?”
And there is clearly disagreement between these views – we can’t hold them all at once without some kind of incoherence
If two people agree about how humans form concepts, and one says that certain abstract objects we’ve formed concepts for are “real,” and another says they’re “not real,” they aren’t necessarily disagreeing about anything substantive.
Sometimes people disagree about concept formation, or (gasp) don’t even give it any role in their story of morality. There’s plenty of room for incoherence there.
But along your Boyd-Blackburn axis, arguments about what to label “real” are more about where to put emphasis, and often smuggle in social/emotive arguments about how we should act or feel in certain situations.
(Re: The Tails Coming Apart As Metaphor For Life. I dunno, if most people, upon reflection, find that the extremes prescribed by all straightforward extrapolations of our moral intuitions look ugly, that sounds like convergence on… not following any extrapolation into the crazy scenarios and just avoiding putting yourself in the crazy scenarios. It might just be wrong for us to have such power over the world as to be directing us into any part of Extremistan. Maybe let’s just not go to Extremistan – let’s stay in Mediocristan (and rebrand it as Satisficistan). If at first something sounds exciting and way better than where you are now, but on reflection looks repugnant – worse than where you are now – then maybe don’t go there. If utilitarianism, Christianism etc yield crazy results in the limit, so much the worse for them. Repugnance keeps hitting your gaze upon tails that have come apart? Maybe that’s because what you care about are actually homeostatic property clusters: the good doesn’t “boil down” to one simple thing like happiness or a few commands written on a stone tablet. Maybe you care about a balance of things – about following all four Red, Yellow, Blue and Green lines (along with 100 other ones no doubt) – never one thing at the unacceptable expense of another. But this is a topic for another day and I’m only gesturing vaguely at a response.)
(Sorry for delay! Was on vacation. Also, got a little too into digging up my old meta-ethics readings. Can’t spend as much time on further responses...)
Although between Boyd and Blackburn, I’d point out that the question of realism falls by the wayside...
I mean fwiw, Boyd will say “goodness exists” while Blackburn is arguably committed to saying “goodness does not exist” since in his total theory of the world, nothing in the domain that his quantifiers range over corresponds to goodness – it’s never taken as a value of any of his variables. But I’m pretty sure Blackburn would take issue with this criterion for ontological commitment, and I suspect you’re not interested in that debate. I’ll just say that we’re doing something when we say e.g. “unicorns don’t exist” and some stories are better than others regarding what that something is (though of course it’s open question as to which story is best).
they both seem to agree we’re modeling the world and then pointing at some pattern we’ve noticed in the world
I think the point of agreement you’re noticing here is their shared commitment to naturalism. Neither thinks that morality is somehow tied up with spooky acausal stuff. And yes, to talk very loosely, they are both pointing at patterns in the world and saying “that’s what’s key to understanding morality.” But contra:
If two people agree about how humans form concepts, and one says that certain abstract objects we’ve formed concepts for are “real,” and another says they’re “not real,” they aren’t necessarily disagreeing about anything substantive.
they are having a substantive disagreement, precisely over which patterns are key to understanding morality. They likely agree more or less on the general story of how human concepts form (as I understand you to mean “concept formation”), but they disagree about the characteristics of the concept [goodness] – its history, its function, how we learn more about its referent (if it has any) etc. Blackburn’s theory of [goodness] (a theory of meta-ethics) points only to feeling patterns in our heads/bodies (when talking “external” to the moral linguistic framework, i.e. in his meta-ethical moments; “internal” to that framework he points to all sorts of things. I think it’s an open question whether he can get away with this internal external dance,[1] but I’ll concede it for now). Boyd just straightforwardly points to all sorts of patterns, mostly in people’s collective and individual behavior, some in our heads, some in our physiology, some in our environment… And now the question is, who is correct? And how do we adjudicate?
Maybe I can sharpen their disagreement with a comparison. What function does “tiger” serve in our discourse? To borrow terms from Huw Price, is it an e-representation which serves to track or co-vary with a pattern (typically in the environment), or is it an i-representation which serves any number of other “in-game” functions (e.g. signaling a logico-inferential move in the language game, or maybe using/enforcing/renegotiating a semantic rule)? Relevant patterns to determine the answer to such questions: the behaviour of speakers. Also, we will need to get clear on our philosophy of language/linguistic theory: not everyone agrees with Price that this “new bifurcation” is all that important – people will try to subsume one type of role under another.[2] Anyway, suppose we now agree that “tiger” serves to refer, to track certain patterns in the environment. Now we can ask, how did “tiger” come to refer to tigers? Relevant patterns seem to include:
the evolution of a particular family of species – the transmission and gradual modification of common traits between generations of specimens
the evolution of the human sensory apparatus, which determines what sorts of bundles of patterns humans tend to track as unified wholes in their world models
the phonemes uttered by the first humans to encounter said species, and the cultural transmission/evolution of that guttural convention to other humans
...and probably much more I’m forgetting/glossing over/ignoring.
We can of course run the same questions for moral terms. And on nearly every point Blackburn and Boyd will disagree. None of these are spooky questions, but they seem relevant to helping us get clear on our collective project to study tigers – what it is and how to go about it. Of course zoologists don’t typically need to go to the same lengths ethicists do, but I think its fair to chalk that up to the how controversial moral talk is. It’s important to note that neither Blackburn nor Boyd are in the business of revising the function/referents of moral talk: they don’t want to merely stipulate the function/referent of “rightness” but instead, take the term as they hear it in the mouths of ordinary speakers and give an account of its associated rules of use, its function, the general shape of its referent (if it has one).
At this point you might object: what’s the point? How does this have any bearing on what I really care about, the first-order stuff – e.g. whether stealing is wrong or not? One appeal of meta-ethics, I think, is that it presents a range of non-moral questions that we can hopefully resolve in more straightforward ways (especially if we all agree on naturalism), and that these non-moral questions will allow us to resolve many first-order moral disputes. On the (uncontroversial? in any case, empirically verifiable) assumption that our moralizing (moral talk, reflection, judgment) serves some kind of function or is conducive to some type of outcome, then hopefully if we can get a better handle on what we’re are doing when we moralize maybe we can do it better by its own lights.[3]
Assuming of course one wants to moralize better – no one said ethics/meta-ethics would be of much interest to the amoralist. Here is indeed a meta-preference – the usual one appealed to in order to motivate the (meta-)ethicists’ entreprise. (Most people aren’t anti-moralists, who are only interested in meta-ethics insofar as it helps them do moralizing worse. And few are interested in making accurate predictions about homo sapiens’ moralizing for its own sake, without applying it to one’s own life). But I don’t see this as threatening or differentiating from other scientific endeavours. It’s not threatening (i.e. the bootstrapping works) because, as with any inquiry, we begin with already some grasp of our subject matter, the thing we’re interested in. We point and say “that’s what I want to investigate.”As we learn more about it, refining the definition of our subject matter, our interest shifts to track this refinement too (either in accordance with meta-preferences, or through shifts in our preferences in no way responsive to our initial set of preferences). This happens in any inquiry though. Suppose I care about solving a murder, but in the course of my investigation I discover no one killed the alleged victim – they died of an unrelated causes. At that point, I may drop all interest upon realizing no murder occurred, or I might realize what I really wanted to solve was the death of this person.
Might we end up not caring about the results of meta-ethics? I find that highly unlikely, assuming we have the meta-preference of wanting to do this morality thing better, whatever it turns out to be. This meta-preference assumes as little as possible about its subject, in the same way that an interest in solving a death assumes less about its subject than an interest in solving a murder. Meta-ethicists are like physicists who are interested in understanding what causes the perturbations Uranus’ orbit, whatever it turns out to be: they are not married to a specific planet-induced-perturbations hypothesis, dropping all interest once Vulcan was found missing.
Hopefully we agree on the first-order claim that one should want to do this morality thing better – whatever “doing morality better” turns out to be! In much same way that a athlete will, upon noting that breathing is key to better athletic performance, want to “do breathing better” whatever breathing turns out to be. The only difference with the athlete is that I take “doing morality better” to be among my terminal goals, insofar as its virtuous to try and make oneself more virtuous. (It’s not my only terminal goal of course – something something shard theory/allegory of the chariot).
To make sure things are clear: naturalists all agree there is a process as neutral as any other scientific process for doing meta-ethics – for determining what it is homo sapiens are doing when they engage in moralizing. This is the methodological (and ultimately, metaphysical) point of agreement between e.g. Blackburn and Boyd. We need to e.g. study moral talk, observe whether radical disagreement is a thing, and other behaviour etc. (Also taken as constraints: leaving typical moral discourse/uncontroversial first-order claims intact.) Naturalist realists start to advance a meta-ethical theory when they claim that there is a process as neutral as any other scientific process for determining what is right and what is wrong. On naturalist realist accounts our first-order ethics is (more or less) in the same business as every other science: getting better at predictions in a particular domain (according to LW’s philosophy of science). To simplify massively: folk morality is the proto-theory for first-order ethics; moral talk is about resolving whose past predictions about rightness/wrongness were correct, and the making of new predictions. None of this is a given of course – I’m not sure naturalist realist meta-ethics is correct! But I don’t see why it’s obviously false.
This brings me back to my original point: it’s not obvious what homo sapiens are doing when they engage in moralizing! It seems to me we still have a lot to learn! It’s not at all obvious to me that our moral terms are not regulated by pretty stable patterns in our environment+behaviour and that together they don’t form an attractor.
If we have a crux, I suspect it’s in the above, but just in case I’ll note some other, more “in the weeds” disagreements between Blackburn and Boyd. (They are substantive, for the broad reasons given above, but you might not feel what’s at stake without having engaged in the surrounding theoretical debates.)
Blackburn won’t identify goodness with any of the patterns mentioned earlier – arguably he can’t strictly (i.e. external to the moral linguistic framework) agree we can determine the truth of any moral claims (where “truth” here comes with theoretical baggage). Ultimately, moral claims to him are just projections of our attitudes, not claims on the world, despite remaining “truth-apt.” (He would reject some of this characterization, because he wants to go deflationist about truth, but then his view threatens to collapse into realism – see Taylor paper below). Accordingly, and contra Yudkowsky, he does not take “goodness” to be a two-place predicate with its predication relativized to the eye of the beholder. (“Goodness” is best formalized as an operator, and not a predicate according to Blackburn.) This allows him to refute that what’s good depends on the eye of the beholder. You can go with subjectivists (moral statements are reports of attitudes, attitudes are what determine what is good/bad relative to the person with those attitudes), who point to basically the same patterns as Blackburn regarding “what is key to understanding morality,” and now you don’t have to do this internal external dance. But this comes with other implications: moral disagreement becomes very hard to account for (when I say “I like chocolate” and you say “I like vanilla” are we really disagreeing?), and one is committed to saying things like “what’s good depends on the eye of the beholder.”
I know it can sound like philosophers are trying to trap you/each other with word games and are actually just tripping on their own linguistic shoelaces. But I think it’s actually just really hard to say all the things I think you want to say without contradiction (or to be a person with all the policies you want to have): that’s part of what what I’m trying to point out in the previous paragraph. In the same vein, perhaps the most interesting recent development in this space has been to investigate whether views like Blackburn’s don’t just collapse into “full-blown” realism like that of Boyd (along with all it’s implications for moral epistemology). This is the Taylor paper I sent you a few months ago (but see FN 2 below). Similarly, Egan 2007 points out how Blackburn’s quasi-realism could (alternatively) collapse into subjectivism.
the actionable points of disagreement are things like “how much should we be willing to let complicated intuitions be overruled by simple intuitions?”
I suspect their disagreement is deeper than you think, but I’m not sure what you mean by this: care to clarify?
I use Carnap’s internal-external distinction but IIRC, Blackburn’s view isn’t exactly the same since Carnap’s internal-external distinction is meant to apply to all linguistic frameworks, where Blackburn seems to be trying to make a special carve out specifically for moral talk. But it’s been awhile since I properly read through these papers. I’m pretty sure Blackburn draws on Carnap though.
I mention Price’s theory, because his global expressivism might be the best chance anti-realists like Blackburn have for maintaining their distance from realism while retaining their right to ordinary moral talk. There is still much to investigate!
“by it’s own lights” here is not spooky. We notice certain physical systems that have collections of mechanisms that each support one another in maintaining certain equilibria: each mechanism is said to have a certain function in this system. We can add to/modify mechanisms in the system in order to make it more or less resilient to shocks, more or less reliably reach and maintain those equilibria. We’re “helping” the system by its lights when we make it more resilient/robust/reliable; “hindering” it when we make it less resilient/robust/reliable.
To make sure things are clear: naturalists all agree there is a process as neutral as any other scientific process for doing meta-ethics – for determining what it is homo sapiens are doing when they engage in moralizing. This is the methodological (and ultimately, metaphysical) point of agreement between e.g. Blackburn and Boyd
How come they disagree on all those apparently non-spooky questions about relevant patterns in the world? I’m curious how you reconcile these.
In science the data is always open to some degree of interpretation, but a combination of the ability to repeat experiments independent of the experimenter and the precision with which predictions can be tested tends to gradually weed out different interpretations that actually bear on real-world choices.
If long-term disagreement is maintained, my usual diagnosis would be that the thing being disagreed about does not actually connect to observation in a way amenable to science. E.g. maybe even though it seems like “which patterns are important?” is a non-spooky question, actually it’s very theory-laden in a way that’s only tenuously connected to predictions about data (if at all), and so when comparing theories there isn’t any repeatable experiment you could just stack up until you have enough data to answer the question.
Alternately, maybe at least one of them is bad at science :P
It’s not at all obvious to me that our moral terms are not regulated by pretty stable patterns in our environment+behaviour and that together they don’t form an attractor.
In the strong sense that everyone’s use of “morality” converges to precisely the same referent under some distribution of “normal dynamics” like interacting with the world and doing self-reflection? That sort of miracle doesn’t occur for the same reason coffee and cream don’t spontaneously un-mix.
But that doesn’t happen even for “tiger”—it’s not necessary that everyone means precisely the same thing when they talk about tigers, as long as the amount of interpersonal noise doesn’t overwhelm the natural sparsity of the world that allows us to have single-world handles for general categories of things. You could still call this an attractor, it’s just not a pointlike attractor—there’s space for different people to use “tiger” in different ways that are stable under normal dynamics.
If that’s how it is for “morality” too (“if morality is as real as tigers” being a cheeky framing), then if we could somehow map where everyone is in concept space, I expect everyone can say “Look how close together everyone gets under normal dynamics, this can be framed as a morality attractor!” But it would be a mistake to then say “Therefore the most moral point is the center, we should all go there.”
the actionable points of disagreement are things like “how much should we be willing to let complicated intuitions be overruled by simple intuitions?”
I suspect their disagreement is deeper than you think, but I’m not sure what you mean by this: care to clarify?
I forget what I was thinking, sorry. Maybe the general gist was “if you strip away the supposedly-contingent disagreements like ‘is there a morality attractor,’” what are the remaining fundamental disagreements about how to do moral reasoning?
How come they disagree on all those apparently non-spooky questions about relevant patterns in the world?
tl;dr: I take meta-ethics, like psychology and economics ~200 years ago, to be asking questions we don’t really have the tools or know-how to answer. And even if we did, there is just a lot of work to be done (e.g. solving meta-semantics, which no doubt involves solving language acquisition. Or e.g. doing some sort of evolutionary anthropology of moral language). And there are few to do the work, with little funding.
Long answer: I take one of philosophy’s key contributions to the (more empirical) sciences to be the highlighting of new or ignored questions, conceptual field clearing, the laying out of non-circular pathways in the theoretical landscape, the placing of landmarks at key choice points. But they are not typically the ones with the tools to answer those questions or make the appropriate theoretical choices informed by finer data. Basically, philosophy generates new fields and gets them to a pre-paradigmatic stage: witness e.g. Aristotle on physics, biology, economics etc.; J. S. Mill and Kant on psychology; Yudkowsky and Bostrom on AI safety; and so on. Give me enough time and I can trace just about every scientific field to its origins in what can only be described as philosophical texts. Once developed to that stage, putatively philosophical methods (conceptual analysis, reasoning by analogy, logical argument, postulation and theorizing, sporadic reference to what coarse data is available) won’t get things much further – progress slows to a crawl or authors might even start going in circles until the empirical tools, methods, interest and culture are available to take things further.
(That’s the simplified, 20-20 hindsight view with a mature philosophy and methodology of science in hand: for much of history, figuring out how to “take things further” was just as contested and confused as anything else, and was only furthered through what was ex ante just more philosophy. Newton was a rival of Descartes and Leibniz: his Principia was a work of philosophy in its time. Only later did we start calling it a work of physics, as pertaining to a field of its own. Likewise with Leibniz and Descartes’ contributions to physics.)
Re: meta-ethics, I don’t think it’s going in circles yet, but do recognize the rate at which it has produced new ideas (found genuinely new choice points) has slowed down. It’s still doing much work in collapsing false choice points though (and this seems healthy: it should over-generate and then cut down).
One thing it has completely failed to do is sell the project to the rest of the scientific community (hence why I write). But it’s also tough sell. There are various sociological obstacles at work here:
20th century ethical disasters: I think after the atrocities committed in the name of science during, during the (especially early) 20th century, scientists rightly want nothing to do with anything that smells normative. In some sense, this is a philosophical success story: awareness of the naturalistic fallacy has increased substantially. The “origins and nature of morality” probably raises a lot of alarm bells for many scientists (though, yes, I’m aware there are evolutionary biologists who explore the topic. I want to see more of this). To be clear, the wariness is warranted: this subject is indeed a normative minefield. But that doesn’t mean it can’t be crossed and that answers can’t be found. (I actually think, in the specific case of meta-ethics, part of philosophy’s contribution is to clear or at least flag the normative mines – keep the first and second order claims as distinct as possible).
Specialization: As academia has specialized, there has been less cross-departmental pollination.
Philosophy as a dirty word: I think “hard scientists” have come to associate “philosophy” (and maybe especially “ethics”) with “subjective” or something, and therefore to be avoided. Like, for many it’s just negative association at this point, with little reason attached to it. (I blame Hegel – he’s the reason philosophy got such a bad rap starting in the early 20th century).
Funding: How many governments or private funding institutions in today’s post-modern world do you expect prioritize “solving the origins and nature of morality” over other more immediately materially/economically useful or prestigious/constituent-pleasing research directions?
There are also methodological obstacles: the relevant data is just hard to collect; the number of confounding variables, myriad; the dimensionality of the systems involved, incredibly high! Compare, for example, with macroeconomics: natural experiments are extremely few and far between, and even then confounding variables abound; the timescales of the phenomena of interest (e.g. sustained recessions vs sustained growth periods) are very long, and as such we have very little data – there’ve only been a handful of such periods since record keeping began. We barely understand/can predict macro-econ any better than we did 100 years ago, and it’s not for a lack of brilliance, rigor or funding.
Alternately, maybe at least one of them is bad at science :P
In the sense that I take you to be using “science” (forming a narrow hypothesis, carefully collecting pertinent data, making pretty graphs with error bars) neither of them are probably doing it well.[1] But we shouldn’t really expect them to? Like, that’s not what the discipline is good for.
I’d bet they liberally employ the usual theoretical desiderata (explanatory power, ontological parsimony, theoretical conservatism) to argue for their view, but they probably only make cursory reference to empirical studies. And until they are do refer to more empirical work, they won’t converge on an answer (or improve our predictions, if you prefer). But, again, I don’t expect them to, since I think most of the pertinent empirical work is yet to be done.
“if morality is as real as tigers” being a cheeky framing
I’m not surprised you find this cheeky, but just FYI I was dead serious: that’s pretty much literally what I and many think is possibly the case.
it’s not necessary that everyone means precisely the same thing when they talk about tigers, as long as the amount of interpersonal noise doesn’t overwhelm the natural sparsity of the world that allows us to have single-world handles for general categories of things. You could still call this an attractor, it’s just not a pointlike attractor—there’s space for different people to use “tiger” in different ways that are stable under normal dynamics. [...] But it would be a mistake to then say “Therefore the most moral point is the center, we should all go there.”
So this is very interesting to me, and I think I agree with you on some points here, but that you’re missing others. But first I need to understand what you mean by “natural sparsity” and what your (very very rough) story is of how our words get their referents. I take it you’re drawing on ML concepts and explanations, and it sounds like a story some philosophers tell, but I’m not familiar with the lingo and want to understand this better. Please tell me more. Related: would you say that we know more about water than our 1700s counterparts, or would you just say “water” today refers to something different than what it referred to in the 1700s? In which case, what is it we’ve gained relative to them? More accurate predictions regarding… what?
Maybe the general gist was “if you strip away the supposedly-contingent disagreements like ‘is there a morality attractor,’” what are the remaining fundamental disagreements about how to do moral reasoning?
Thanks, yep, I’m not sure. Whether or not there is an attractor (and how that attraction is supposed to work) seems like the major crux – certainly in our case!
One thing I want to defend and clarify: someone the other day objected that philosophers are overly confident in their proposals, overly married to them. I think I would agree in some sense, since I think their work is often in doing pre-paradigmatic work: they often jump the gun and declare victory, take philosophizing to be enough to settle a matter. Accordingly, I need to correct the following:
Meta-ethicists are like physicists who are interested in understanding what causes the perturbations Uranus’ orbit, whatever it turns out to be: they are not married to a specific planet-induced-perturbations hypothesis, dropping all interest once Vulcan was found missing.
I should have said the field as whole is not married to any particular theory. But I’m not sure having individual researchers try so hard to develop and defend particular views is so perverse. Seems pretty normal that in trying to advance theory, individual theorists heavily favor one or another theory – the one they are curious about, want to develop, make robust and take to its limit. One shouldn’t necessarily look to one particular frontier physicist to form your best guess about their frontier – instead one should survey the various theories being advanced/developed in the area.
For posterity, we discussed in-person, and both (afaict) took the following to be clear predictive disagreements between the (paradigmatic) naturalist realists and anti-realists (condensed for brevity here, to the point of really being more of a mnemonic device):
Realists claim that:
(No Special Semantics): Our use of “right” and “wrong” are picking up, respectively, on what would be appropriately called the rightness and wrongness features in the world.
(Non-subjectivism/non-relativism): These features are largely independent of any particular homo sapiens attitudes and very stable over time.
(Still Learning): We collectively haven’t fully learned these features yet – the sparsity of the world does support and can guide further refinement of our collective usage of moral terms should we collectively wish to generalize better at identifying the presence of said features. This is the claim that leads to claims of there being a “moral attractor.”
Anti-realists may or may not disagree with (1) depending on how they cash out their semantics, but they almost certainly disagree with something like (2) and (3) (at least in their meta-ethical moments).
But I’m not really accusing y’all of saying “try to produce a future that has no basis in human values.” I am accusing this post of saying “there’s some neutral procedure for figuring out human values, we should use that rather than a non-neutral procedure.”
My read was more “do the best we can to get through the acute risk period in a way that lets humanity have the time and power to do the best it can at defining/creating a future full of value.” And that’s in response and opposed to positions like “figure out / decide what is best for humanity (or a procedure that can generate the answer to that) and use that to shape the long term future.”
What the heck is this supposed to mean? Great according to the Inherent Essence Of Goodness that lives inside futures, rather than as part of human evaluations? Because I’ve got bad news for that plan.
Honestly, I’m disappointed by this post.
You say you’ve found yourself making this argument a lot recently. That’s fair. I think it’s totally reasonable that there are some situations where this argument could move people in the right direction—maybe the audience is considering defecting about aligning AI with humanity but would respond to orders from authority. Or maybe they’re outsiders who think you are going to defect, and you want to signal to them how you’re going to cooperate not just because it’s a good idea, but because it’s an important moral principle to you (as evolution intended).
But this is not an argument that you should just throw out scattershot. Because it’s literally false. There is no single attractor that all human values can be expected to fall into upon reflection. The primary advantage of AI alignment over typical philosophy is that when alignment researchers realize some part of what they were previously calling “alignment” is impossible, they can back up and change how they’re cashing out “alignment” so that it’s actually possible—philosophers have to keep caring about the impossible thing. This advantage goes away if we don’t use it.
Yes, plenty of people liked this post. But I’m holding you to a high standard. Somewhere people should be expected to not keep talking about the impossible thing. Somewhere, there is a version of this post that talks about or directly references:
Game-theoretic arguments for cooperation.
Why game-theoretic arguments are insufficient for egalitarianism (e.g. overly weighting the preferences of the powerful) but still mean that AI should be designed with more than just you in mind, even before accounting for a human preference for an egalitarian future.
Why egalitarianism is a beautiful moral principle that you endorse.
“Wait, wasn’t that this post?” you might say. Kind of! Making a plain ethical/aesthetic argument is like a magic trick where the magician tells you up front that it’s an illusion. This post is much the same magic trick, but the magician is telling you it’s real magic.
Realistic expectations for what egalitarianism can look like in the real world.
It cannot look like finding the one attractor that all human values converge to upon reflection because there is no one attractor that all human values converge to upon reflection.
Perhaps an analysis of how big the “fingerprints” of the creators of the AI are in such situations—e.g. by setting the meta-level standards for what counts as a “human value”.
There is a non-zero chance that the meta-preferences, that end up in charge of the preferences, that end up in charge of the galaxy will come from Mechanical Turkers.
The rest of the quote explains what this means:
The present is “good on its own terms”, rather than “good on Ancient Romans’ terms”, because the Ancient Romans weren’t able to lock in their values. If you think this makes sense (and is a good thing) in the absence of an Inherent Essence Of Goodness, then there’s no reason to posit an Inherent Essence Of Goodness when we switch from discussing “moral progress after Ancient Rome” to “moral progress after circa-2022 civilization”.
Could you be explicit about what argument you’re making here? Is it something like:
Even when two variables are strongly correlated, the most extreme value of one will rarely be the most extreme value of the other; therefore it’s <50% likely that different individuals’ CEVs will yield remotely similar results? (E.g., similar enough that one individual will consider the output of most other individuals’ CEVs morally acceptable?)
Or?:
The optimal world-state according to Catholicism is totally different from the optimal world-state according to hedonic utilitarianism; therefore it’s <50% likely that the CEV of a random Catholic will consider the output of a hedonic utilitarian’s CEV morally acceptable. (And vice versa.)
Regarding the second argument: I don’t think that Catholicism is stable under reflection (because it’s false, and a mind needs to avoid thinking various low-complexity true thoughts in order to continue believing Catholicism), so I don’t think the Catholic and hedonic utilitarian’s CEVs will end up disagreeing, even though the optimum for Catholicism and for hedonic utilitarianism disagree.
(I’d bet against hedonic utilitarianism being true as well, but this is obviously a much more open question. And fortunately, CEV-ish buck-passing processes make it less necessary for anyone to take risky bets like that; we can just investigate what’s true and base our decisions on what we learn.)
Catholicism is a relatively easy case, and I expect plenty of disagreement about exactly how much moral disagreement looks like the Catholicism/secularism debate. I expect a lot of convergence on questions like “involuntarily enslaving people: good or bad?”, on the whole, and less on questions like “which do you want more of: chocolate ice cream, or vanilla ice cream?”. But it’s the former questions that matter more for CEV; the latter sorts of questions are ones where we can just let individuals choose different lives for themselves.
“Correlations tend to break when you push things to extremes” is a factor that should increase our expectation of how many things people are likely to morally disagree about. Factors pushing in the other direction include ‘not all correlations work that way’ and evidence that human morality doesn’t work that way.
E.g., ‘human brains are very similar’, ‘empirically, people have converged a lot on morality even though we’ve been pushed toward extremes relative to our EAA’, ‘we can use negotiation and trade to build value systems that are good compromises between two conflicting value systems’, etc.
Also ‘the universe is big, and people’s “amoral” preferences tend to be about how their own life goes, not about the overall distribution of matter in the universe’; so values conflicts tend to be concentrated in cases where we can just let different present-day stakeholders live different sorts of lives, given the universe’s absurd abundance of resources.
Nate said “it shouldn’t matter who does the job, because there should be a path-independent attractor-well that isn’t about making one person dictator-for-life or tiling a particular flag across the universe forever”, and you said this is “literally false”. I don’t see what’s false about it, so if the above doesn’t clarify anything, maybe you can point to the parts of the Arbital article on CEV you disagree with (https://arbital.com/p/cev/)? E.g., I don’t see Nate or Eliezer claiming that people will agree about vanilla vs. chocolate.
Footnote 2 says that Nate isn’t “stupid enough to toss coordination out the window in the middle of a catastrophic emergency with human existence at stake”. If that isn’t an argument ‘cooperation is useful, therefore we should take others’ preferences into account’, then what sort of argument do you have in mind?
I don’t know what you mean by “egalitarianism”, or for that matter what you mean by “why”. Are you asking for an ode to egalitarianism? Or an argument for it, in terms of more basic values?
The present is certainly good on my terms (relative to ancient Rome). But the present itself doesn’t care. It’s not the type of thing that can care. So what are you trying to pack inside that phrase, “its own terms”?
If you mean it to sum up a meta-preference you hold about how moral evolution should proceed, then that’s fine. But is that really all? Or are you going to go reason as if there’s some objective essence of what the present’s “own terms” are—e.g. by trying to apply standards of epistemic uncertainty to the state of this essence?
I’ll start by quoting the part of Scott’s essay that I was particularly thinking of, to clarify:
Okay, so.
What’s the claim I’m projecting onto Nate, that I’m saying is false? It’s something like: “The goal should be to avoid locking in any particular morals. We can do this by passing control to some neutral procedure that allows values to evolve.”
And what I am saying is something like: There is no neutral procedure. There is no way to avoid privileging some morals. This is not a big problem, it’s just how it is, and we can be okay with it.
Related and repetitive statements:
When extrapolating the shared train line past West Oakland, there are multiple ways to continue, but none of them are “the neutral way to do the extrapolation.”
The self-reflection function has many attractors for almost all humans, groups, societies, and AGI architectures. Different starting points might land us in different attractors, and there is no unique “neutral starting point.”
There are many procedures for allowing values to evolve, most of them suck, and picking a good one is an action that will bear the fingerprints of our own values. And that’s fine!
Human meta-preferences, the standards by which we judge what preference extrapolation schemes are good, are preferences. We do not have any mysterious non-preference standards for doing value aggregation and extrapolation.
There is not just one CEV that is the neutral way to do preference aggregation and extrapolation. There are lots of choices that we have to / get to make.
So as you can see, I wasn’t really thinking about differences between “the CEV” of different people—my focus was more on differences between ways of implementing CEV of the same people. A lot of these ways are going to be more or less as good—like comparing your favorite beef stew vs. a 30-course modernist meal. But not all possible implementations of CEV are good, for example you could screw up by modeling exposing people to extreme or highly-optimized stimuli when extrapolating them, leading to the AI causing large changes in the human condition that we wouldn’t presently endorse.
By egalitarianism I mean building an AI that tries to help all people, and be responsive to the perspectives of all people, not just a select few. And yes, definitely an ode :D
I would say that there’s a logical object that a large chunk of human moral discourse is trying to point at — something like “the rules of the logical game Morality”, analogous to “the rules of the logical game Chess”. Two people can both be discussing the same logical object “the rules of Chess”, but have different beliefs about what that logical object’s properties are. And just as someone can be mistaken or uncertain bout the rules of chess — or about their interaction in a specific case — someone can be uncertain about morality.
Do you disagree with any of that?
In the CEV Arbital page, Eliezer says:
“Even the terms in CEV, like ‘know more’ or ‘extrapolate a human’, seem complicated and value-laden.”
If the thing you’re saying is that CEV is itself a complicated idea, and it seems hard for humanity to implement such a thing without already having a pretty deep understanding of human values, then I agree. This seems like an important practical challenge for pulling off CEV: you need to somehow start the bootstrapping process, even though our current understanding of human values is insufficient for formally specifying the best way to do CEV.
If instead you just mean to say “there’s no reason to favor human values over termite values unless you already care about humans”, then yeah, that seems even more obvious to me. If you think Nate is trying to argue for human morality from a humanity-indifferent, View-From-Nowhere perspective, then you’re definitely misunderstanding Nate’s perspective.
If “neutral” here means “non-value-laden”, then sure. If “neutral” here means “non-arbitrary, from a human POV”, then it seems like an open empirical question how many arbitrary decisions like this are required in order to do CEV.
I’d guess that there are few or no arbitrary decisions involved in using CEV to answer
high-takeshigh-stakes moral questions.This makes me think that you misunderstood Nate’s essay entirely. The idea of “don’t leave your fingerprints on the future” isn’t “try to produce a future that has no basis in human values”. The idea is “try to produce a future that doesn’t privilege the AGI operator’s current values at the expense of other humans’ values, the values humans would develop in the future if their moral understanding improved, etc.”.
If you deploy AGI and execute a pivotal act, don’t leave your personal fingerprints all over the long-term future of humanity, in a way that distinguishes you from other humans.
When I think about the rules of chess, I basically treat them as having some external essence that I have epistemic uncertainty about. What this means mechanistically is:
When I’m unsure about the rules of chess, this raises the value of certain information-gathering actions, like checking the FIDE website, asking a friend, reading a book.
If I knew the outcomes of all those actions, that would resolve my uncertainty.
I have probabilities associated with my uncertainty, and updates to those probabilities based on evidence should follow Bayesian logic.
Decision-making under uncertainty should linearly aggregate the different possibilities that I’m uncertain over, weighted by their probability.
So the rules of chess are basically just a pattern out in the world that I can go look at. When I say I’m uncertain about the rules of chess, this is epistemic uncertainty that I manage the same as if I’m uncertain about anything else out there in the world.
The “rules of Morality” are not like this.
When I’m unsure about whether I care about fish suffering, this does raise the value of certain information-gathering actions like learning more about fish.
But if I knew the outcomes of all those actions, this wouldn’t resolve all my uncertainty.
I can put probabilities to various possibilities, and can update them on evidence using Bayesian logic—that part still works.
Decision-making under the remaining-after-evidence part of the uncertainty doesn’t have to look like linear aggregation. In fact it shouldn’t—I have meta-preferences like “conservatism,” which says that I should trust models differently depending on whether they seem to be inside their domain of validity or not.
So there’s a lot of my uncertainty about morality that doesn’t stem from being unaware about facts. Where does it come from? One source is self-modeling uncertainty—how do I take the empirical facts about me and the world, and use that to construct a model of myself in which I have preferences, so that I can reflect on my own preferences? There are multiple ways to do this.
So if, and I’m really not sure, but if you were thinking of everything as like uncertainty about the rules of chess, then I would expect two main mistakes: expecting there to be some procedure that takes in evidence and spits out the one right answer, and expecting aggregating over models for decision-making to look like linear aggregation.
Well, maybe I misunderstood. But I’m not really accusing y’all of saying “try to produce a future that has no basis in human values.” I am accusing this post of saying “there’s some neutral procedure for figuring out human values, we should use that rather than a non-neutral procedure.”
This and earlier comments are bald rejections of moral realism (including, maybe especially, naturalist realism). Can I get some evidence for this confident rejection?
I’m not sure what linking Yudkowsky’s (sketch of a) semantics for moral terms is meant to tell us. Case in point, Cornell Realists adopt a similar relativism in their semantics (“good” like “healthy” can only be judged relative to the type of creature you are), but (some of them anyway) will still argue that we can simply discover what is good through a more or less standard scientific process. In other words, they do believe there is a basin of attraction for human values and there is a neutral process for finding it. (It’s only a basin of attraction of course insofar as this process will find it and should we trust that process, we will gravitate toward that basin). To be clear, few if any claim there is one exact lowest point in this basin – there will be many constellations of goods in a life that are equally worthwhile in some sense (first gloss: in the sense that we would be indifferent to a choice between those lives, from behind a veil of ignorance that only assumes we are homo sapiens).
In any case, every major view in meta-ethics has a developed semantics for moral terms: you’ll have to say more about why e.g. your favored semantics is a point in favor of your meta-ethical view. You don’t need to start from scratch of course: philosophers have been working on this for decades (and continue to). Ayer’s classic emotivism (a.k.a. expressivism) ran into the problem of embedding (if moral statement P doesn’t have a truth value, how am I supposed to evaluate statements with P embedded in them, like “P → Q”? Our nice truth tables get thrown out the window...). In response several anti-realists have made proposals, e.g. Blackburn’s quasi-realism. More recently, those responses have come under fire for struggling to hold onto their distinct semantics (or metaphysics or epistemology) while also holding onto their having a distinct view from realism. There is always Error Theory of course but then you’re committed to saying things like “It is false that arbitrary torture is bad. It is also false that arbitrary torture is good.”
If none of this discussion on meta-ethics is your thing, that’s fine, but then you might want to dampen your certainty? Consider deferring to those who have thought longer on this – update on the distribution of philosophers’ views on meta-ethics, modulo whatever selection effects you think are biasing that distribution in a particular direction?
Sure. Here are some bullet points of evidence:
To all appearances, we’re an evolved species on an otherwise fairly unremarkable planet in a universe that doesn’t have any special rules for us.
The causal history of us talking about morality as a species runs through evolution and culture.
We learn to build models of the world, and can use language to communicate about parts of these models. Sometimes it is relevant that the map is not the territory, and the elements of our discourse are things on maps.
In terms of semantics of moral language, I think the people who have to argue about whether they’re realists or anti-realists are doing a fine job. Having fancy semantics that differentiate you from everyone else was a mistake. Good models of moral language should be able to reproduce the semantics that normal people use every day.
E.g. “It’s true that in baseball, you’re out after three strikes.” is not a sentence that needs deep revision after considering that baseball is an invented, contingent game.
In terms of epistemology of morality, the average philosopher has completely dropped the ball. But since, on average, they think that as well, surely I’m only deferring to those who have thought longer on this when I say that.
Agreed. So much the worse for classic emotivism and error theory.
But semantics seems secondary to you (along with many meta-ethicists frankly – semantic ascent is often just used as a technique for avoiding talking past one another, allowing e.g. anti-realist views to be voiced without begging the question. I think many are happy grab whatever machinery from symbolic logic they need to make the semantics fit the metaphysical/epistemological views they hold more dearly.) I’d like to get clear just what it is you have strong/weak credence in. How would you distribute your credences over the following (very non-exhaustive and simplified) list?
Classic Cultural Relativism: moral rules/rightness are to be identified with cultural codes (and for simplicity, say that goodness is derivative). Implication for moral epistemology: like other invented social games, to determine what is morally right (according to the morality game) we just need to probe the rulemakers/keepers (perhaps society at large or a specific moral authority).
Boyd’s view (example of naturalist realism): moral goodness is to be identified with the homeostatic clusters of natural (read regular, empirically observable) properties that govern the (moral) use of the term “good” in basically the same way that tigerness is to be identified with homeostatic clusters of natural properties that govern the (zoological) use of the term “tiger.” To score highly on tigerness is to score highly on various traits e.g. having orange fur with black strikes, being quadrupedal, being a carnivore, having retractable claws… We’ve learned more about tigers (tigerness) as we encountered more examples (and counterexamples) of them and refined our observation methods/tools; the same goes (will continue to go) for goodness and good people. Implication for moral epistemology: “goodness” has a certain causal profile – investigate what regulates that causal profile, the same we investigate anything else in science. No doubt mind-dependent things like your own preferences or cultural codes will figure among the things that regulate the term “good” but these will rarely have the final say in determining what is good or not. Cultural codes and preferences will likely just figure as one homeostatic mechanism among many.
Blackburn’s Projectivism or Gibbard’s Norm-Expressivism (sophisticated versions of expressivism, examples of naturalist anti-realism): morality is reduced to attitudes/preferences/plans.
According to Blackburn we talk as if moral properties are out their to be investigated the way Boyd suggests we can, but strictly speaking this is false: his view is a form of moral fictionalism. He believes there is no general causal profile to moral terms: nothing besides our preferences/attitudes regulates our usage of these terms. The only thing to “discover” is what our deepest preferences/attitudes are (and if we don’t care about having coherent preferences/attitudes, we can also note our incoherencies). Implication for moral epistemology: learn about the world while also looking deep inside yourself to see how you are moved by that new knowledge (or something to this effect).
According to Gibbard normative statements are expressions of plans – “what to do.” The logical structure of these expressions helps us express, probe and revise our plans for their consistency within a system of plans, but ultimately, no one/nothing outside of yourself can tell you what system of plans to adopt. Implication for moral epistemology: determine what your ultimate plans are and do moral reasoning with others to work out any inconsistencies in your system of plans.
If I had to guess you’re in the vicinity of Blackburn (3.a). Can you confirm? But now, how does your preferred view fit your three bullet points of data better than the others? Your 4th data point, matching normal moral discourse (more like a dataset), is another story. E.g. I think (1) pretty clearly scores worse on this one compared to the others. But the others are debatable, which is part of my point – it’s not obvious which theory to prefer. And there is clearly disagreement between these views – we can’t hold them all at once without some kind of incoherence: there is a choice to be made. How are you making that choice?
As for this:
I’m sorry but I don’t follow. Care to elaborate? You’re saying philosophers have, on average, failed to develop plausible/practical moral epistemologies? Are you saying this somehow implies you can safely disregard their views on meta-ethics? I don’t see how: the more relevant question seems to be what our current best methodology for meta-ethics is and whether you or some demographic (e.g. philosophers) are comparatively better at applying it. Coming up with a plausible/practical moral epistemology is often treated as a goal of meta-ethics. Of course the criteria for success in that endeavor will depend what you think the goals of philosophy or science are.
Can confirm. Although between Boyd and Blackburn, I’d point out that the question of realism falls by the wayside (they both seem to agree we’re modeling the world and then pointing at some pattern we’ve noticed in the world, whether you call that realism or not is murky), and the actionable points of disagreement are things like “how much should we be willing to let complicated intuitions be overruled by simple intuitions?”
If two people agree about how humans form concepts, and one says that certain abstract objects we’ve formed concepts for are “real,” and another says they’re “not real,” they aren’t necessarily disagreeing about anything substantive.
Sometimes people disagree about concept formation, or (gasp) don’t even give it any role in their story of morality. There’s plenty of room for incoherence there.
But along your Boyd-Blackburn axis, arguments about what to label “real” are more about where to put emphasis, and often smuggle in social/emotive arguments about how we should act or feel in certain situations.
(Re: The Tails Coming Apart As Metaphor For Life. I dunno, if most people, upon reflection, find that the extremes prescribed by all straightforward extrapolations of our moral intuitions look ugly, that sounds like convergence on… not following any extrapolation into the crazy scenarios and just avoiding putting yourself in the crazy scenarios. It might just be wrong for us to have such power over the world as to be directing us into any part of Extremistan. Maybe let’s just not go to Extremistan – let’s stay in Mediocristan (and rebrand it as Satisficistan). If at first something sounds exciting and way better than where you are now, but on reflection looks repugnant – worse than where you are now – then maybe don’t go there. If utilitarianism, Christianism etc yield crazy results in the limit, so much the worse for them. Repugnance keeps hitting your gaze upon tails that have come apart? Maybe that’s because what you care about are actually homeostatic property clusters: the good doesn’t “boil down” to one simple thing like happiness or a few commands written on a stone tablet. Maybe you care about a balance of things – about following all four Red, Yellow, Blue and Green lines (along with 100 other ones no doubt) – never one thing at the unacceptable expense of another. But this is a topic for another day and I’m only gesturing vaguely at a response.)
(Sorry for delay! Was on vacation. Also, got a little too into digging up my old meta-ethics readings. Can’t spend as much time on further responses...)
I mean fwiw, Boyd will say “goodness exists” while Blackburn is arguably committed to saying “goodness does not exist” since in his total theory of the world, nothing in the domain that his quantifiers range over corresponds to goodness – it’s never taken as a value of any of his variables. But I’m pretty sure Blackburn would take issue with this criterion for ontological commitment, and I suspect you’re not interested in that debate. I’ll just say that we’re doing something when we say e.g. “unicorns don’t exist” and some stories are better than others regarding what that something is (though of course it’s open question as to which story is best).
I think the point of agreement you’re noticing here is their shared commitment to naturalism. Neither thinks that morality is somehow tied up with spooky acausal stuff. And yes, to talk very loosely, they are both pointing at patterns in the world and saying “that’s what’s key to understanding morality.” But contra:
they are having a substantive disagreement, precisely over which patterns are key to understanding morality. They likely agree more or less on the general story of how human concepts form (as I understand you to mean “concept formation”), but they disagree about the characteristics of the concept [goodness] – its history, its function, how we learn more about its referent (if it has any) etc. Blackburn’s theory of [goodness] (a theory of meta-ethics) points only to feeling patterns in our heads/bodies (when talking “external” to the moral linguistic framework, i.e. in his meta-ethical moments; “internal” to that framework he points to all sorts of things. I think it’s an open question whether he can get away with this internal external dance,[1] but I’ll concede it for now). Boyd just straightforwardly points to all sorts of patterns, mostly in people’s collective and individual behavior, some in our heads, some in our physiology, some in our environment… And now the question is, who is correct? And how do we adjudicate?
Maybe I can sharpen their disagreement with a comparison. What function does “tiger” serve in our discourse? To borrow terms from Huw Price, is it an e-representation which serves to track or co-vary with a pattern (typically in the environment), or is it an i-representation which serves any number of other “in-game” functions (e.g. signaling a logico-inferential move in the language game, or maybe using/enforcing/renegotiating a semantic rule)? Relevant patterns to determine the answer to such questions: the behaviour of speakers. Also, we will need to get clear on our philosophy of language/linguistic theory: not everyone agrees with Price that this “new bifurcation” is all that important – people will try to subsume one type of role under another.[2] Anyway, suppose we now agree that “tiger” serves to refer, to track certain patterns in the environment. Now we can ask, how did “tiger” come to refer to tigers? Relevant patterns seem to include:
the evolution of a particular family of species – the transmission and gradual modification of common traits between generations of specimens
the evolution of the human sensory apparatus, which determines what sorts of bundles of patterns humans tend to track as unified wholes in their world models
the phonemes uttered by the first humans to encounter said species, and the cultural transmission/evolution of that guttural convention to other humans
...and probably much more I’m forgetting/glossing over/ignoring.
We can of course run the same questions for moral terms. And on nearly every point Blackburn and Boyd will disagree. None of these are spooky questions, but they seem relevant to helping us get clear on our collective project to study tigers – what it is and how to go about it. Of course zoologists don’t typically need to go to the same lengths ethicists do, but I think its fair to chalk that up to the how controversial moral talk is. It’s important to note that neither Blackburn nor Boyd are in the business of revising the function/referents of moral talk: they don’t want to merely stipulate the function/referent of “rightness” but instead, take the term as they hear it in the mouths of ordinary speakers and give an account of its associated rules of use, its function, the general shape of its referent (if it has one).
At this point you might object: what’s the point? How does this have any bearing on what I really care about, the first-order stuff – e.g. whether stealing is wrong or not? One appeal of meta-ethics, I think, is that it presents a range of non-moral questions that we can hopefully resolve in more straightforward ways (especially if we all agree on naturalism), and that these non-moral questions will allow us to resolve many first-order moral disputes. On the (uncontroversial? in any case, empirically verifiable) assumption that our moralizing (moral talk, reflection, judgment) serves some kind of function or is conducive to some type of outcome, then hopefully if we can get a better handle on what we’re are doing when we moralize maybe we can do it better by its own lights.[3]
Assuming of course one wants to moralize better – no one said ethics/meta-ethics would be of much interest to the amoralist. Here is indeed a meta-preference – the usual one appealed to in order to motivate the (meta-)ethicists’ entreprise. (Most people aren’t anti-moralists, who are only interested in meta-ethics insofar as it helps them do moralizing worse. And few are interested in making accurate predictions about homo sapiens’ moralizing for its own sake, without applying it to one’s own life). But I don’t see this as threatening or differentiating from other scientific endeavours. It’s not threatening (i.e. the bootstrapping works) because, as with any inquiry, we begin with already some grasp of our subject matter, the thing we’re interested in. We point and say “that’s what I want to investigate.”As we learn more about it, refining the definition of our subject matter, our interest shifts to track this refinement too (either in accordance with meta-preferences, or through shifts in our preferences in no way responsive to our initial set of preferences). This happens in any inquiry though. Suppose I care about solving a murder, but in the course of my investigation I discover no one killed the alleged victim – they died of an unrelated causes. At that point, I may drop all interest upon realizing no murder occurred, or I might realize what I really wanted to solve was the death of this person.
Might we end up not caring about the results of meta-ethics? I find that highly unlikely, assuming we have the meta-preference of wanting to do this morality thing better, whatever it turns out to be. This meta-preference assumes as little as possible about its subject, in the same way that an interest in solving a death assumes less about its subject than an interest in solving a murder. Meta-ethicists are like physicists who are interested in understanding what causes the perturbations Uranus’ orbit, whatever it turns out to be: they are not married to a specific planet-induced-perturbations hypothesis, dropping all interest once Vulcan was found missing.
Hopefully we agree on the first-order claim that one should want to do this morality thing better – whatever “doing morality better” turns out to be! In much same way that a athlete will, upon noting that breathing is key to better athletic performance, want to “do breathing better” whatever breathing turns out to be. The only difference with the athlete is that I take “doing morality better” to be among my terminal goals, insofar as its virtuous to try and make oneself more virtuous. (It’s not my only terminal goal of course – something something shard theory/allegory of the chariot).
To make sure things are clear: naturalists all agree there is a process as neutral as any other scientific process for doing meta-ethics – for determining what it is homo sapiens are doing when they engage in moralizing. This is the methodological (and ultimately, metaphysical) point of agreement between e.g. Blackburn and Boyd. We need to e.g. study moral talk, observe whether radical disagreement is a thing, and other behaviour etc. (Also taken as constraints: leaving typical moral discourse/uncontroversial first-order claims intact.) Naturalist realists start to advance a meta-ethical theory when they claim that there is a process as neutral as any other scientific process for determining what is right and what is wrong. On naturalist realist accounts our first-order ethics is (more or less) in the same business as every other science: getting better at predictions in a particular domain (according to LW’s philosophy of science). To simplify massively: folk morality is the proto-theory for first-order ethics; moral talk is about resolving whose past predictions about rightness/wrongness were correct, and the making of new predictions. None of this is a given of course – I’m not sure naturalist realist meta-ethics is correct! But I don’t see why it’s obviously false.
This brings me back to my original point: it’s not obvious what homo sapiens are doing when they engage in moralizing! It seems to me we still have a lot to learn! It’s not at all obvious to me that our moral terms are not regulated by pretty stable patterns in our environment+behaviour and that together they don’t form an attractor.
If we have a crux, I suspect it’s in the above, but just in case I’ll note some other, more “in the weeds” disagreements between Blackburn and Boyd. (They are substantive, for the broad reasons given above, but you might not feel what’s at stake without having engaged in the surrounding theoretical debates.)
Blackburn won’t identify goodness with any of the patterns mentioned earlier – arguably he can’t strictly (i.e. external to the moral linguistic framework) agree we can determine the truth of any moral claims (where “truth” here comes with theoretical baggage). Ultimately, moral claims to him are just projections of our attitudes, not claims on the world, despite remaining “truth-apt.” (He would reject some of this characterization, because he wants to go deflationist about truth, but then his view threatens to collapse into realism – see Taylor paper below). Accordingly, and contra Yudkowsky, he does not take “goodness” to be a two-place predicate with its predication relativized to the eye of the beholder. (“Goodness” is best formalized as an operator, and not a predicate according to Blackburn.) This allows him to refute that what’s good depends on the eye of the beholder. You can go with subjectivists (moral statements are reports of attitudes, attitudes are what determine what is good/bad relative to the person with those attitudes), who point to basically the same patterns as Blackburn regarding “what is key to understanding morality,” and now you don’t have to do this internal external dance. But this comes with other implications: moral disagreement becomes very hard to account for (when I say “I like chocolate” and you say “I like vanilla” are we really disagreeing?), and one is committed to saying things like “what’s good depends on the eye of the beholder.”
I know it can sound like philosophers are trying to trap you/each other with word games and are actually just tripping on their own linguistic shoelaces. But I think it’s actually just really hard to say all the things I think you want to say without contradiction (or to be a person with all the policies you want to have): that’s part of what what I’m trying to point out in the previous paragraph. In the same vein, perhaps the most interesting recent development in this space has been to investigate whether views like Blackburn’s don’t just collapse into “full-blown” realism like that of Boyd (along with all it’s implications for moral epistemology). This is the Taylor paper I sent you a few months ago (but see FN 2 below). Similarly, Egan 2007 points out how Blackburn’s quasi-realism could (alternatively) collapse into subjectivism.
I suspect their disagreement is deeper than you think, but I’m not sure what you mean by this: care to clarify?
I use Carnap’s internal-external distinction but IIRC, Blackburn’s view isn’t exactly the same since Carnap’s internal-external distinction is meant to apply to all linguistic frameworks, where Blackburn seems to be trying to make a special carve out specifically for moral talk. But it’s been awhile since I properly read through these papers. I’m pretty sure Blackburn draws on Carnap though.
I mention Price’s theory, because his global expressivism might be the best chance anti-realists like Blackburn have for maintaining their distance from realism while retaining their right to ordinary moral talk. There is still much to investigate!
“by it’s own lights” here is not spooky. We notice certain physical systems that have collections of mechanisms that each support one another in maintaining certain equilibria: each mechanism is said to have a certain function in this system. We can add to/modify mechanisms in the system in order to make it more or less resilient to shocks, more or less reliably reach and maintain those equilibria. We’re “helping” the system by its lights when we make it more resilient/robust/reliable; “hindering” it when we make it less resilient/robust/reliable.
How come they disagree on all those apparently non-spooky questions about relevant patterns in the world? I’m curious how you reconcile these.
In science the data is always open to some degree of interpretation, but a combination of the ability to repeat experiments independent of the experimenter and the precision with which predictions can be tested tends to gradually weed out different interpretations that actually bear on real-world choices.
If long-term disagreement is maintained, my usual diagnosis would be that the thing being disagreed about does not actually connect to observation in a way amenable to science. E.g. maybe even though it seems like “which patterns are important?” is a non-spooky question, actually it’s very theory-laden in a way that’s only tenuously connected to predictions about data (if at all), and so when comparing theories there isn’t any repeatable experiment you could just stack up until you have enough data to answer the question.
Alternately, maybe at least one of them is bad at science :P
In the strong sense that everyone’s use of “morality” converges to precisely the same referent under some distribution of “normal dynamics” like interacting with the world and doing self-reflection? That sort of miracle doesn’t occur for the same reason coffee and cream don’t spontaneously un-mix.
But that doesn’t happen even for “tiger”—it’s not necessary that everyone means precisely the same thing when they talk about tigers, as long as the amount of interpersonal noise doesn’t overwhelm the natural sparsity of the world that allows us to have single-world handles for general categories of things. You could still call this an attractor, it’s just not a pointlike attractor—there’s space for different people to use “tiger” in different ways that are stable under normal dynamics.
If that’s how it is for “morality” too (“if morality is as real as tigers” being a cheeky framing), then if we could somehow map where everyone is in concept space, I expect everyone can say “Look how close together everyone gets under normal dynamics, this can be framed as a morality attractor!” But it would be a mistake to then say “Therefore the most moral point is the center, we should all go there.”
I forget what I was thinking, sorry. Maybe the general gist was “if you strip away the supposedly-contingent disagreements like ‘is there a morality attractor,’” what are the remaining fundamental disagreements about how to do moral reasoning?
tl;dr: I take meta-ethics, like psychology and economics ~200 years ago, to be asking questions we don’t really have the tools or know-how to answer. And even if we did, there is just a lot of work to be done (e.g. solving meta-semantics, which no doubt involves solving language acquisition. Or e.g. doing some sort of evolutionary anthropology of moral language). And there are few to do the work, with little funding.
Long answer: I take one of philosophy’s key contributions to the (more empirical) sciences to be the highlighting of new or ignored questions, conceptual field clearing, the laying out of non-circular pathways in the theoretical landscape, the placing of landmarks at key choice points. But they are not typically the ones with the tools to answer those questions or make the appropriate theoretical choices informed by finer data. Basically, philosophy generates new fields and gets them to a pre-paradigmatic stage: witness e.g. Aristotle on physics, biology, economics etc.; J. S. Mill and Kant on psychology; Yudkowsky and Bostrom on AI safety; and so on. Give me enough time and I can trace just about every scientific field to its origins in what can only be described as philosophical texts. Once developed to that stage, putatively philosophical methods (conceptual analysis, reasoning by analogy, logical argument, postulation and theorizing, sporadic reference to what coarse data is available) won’t get things much further – progress slows to a crawl or authors might even start going in circles until the empirical tools, methods, interest and culture are available to take things further.
(That’s the simplified, 20-20 hindsight view with a mature philosophy and methodology of science in hand: for much of history, figuring out how to “take things further” was just as contested and confused as anything else, and was only furthered through what was ex ante just more philosophy. Newton was a rival of Descartes and Leibniz: his Principia was a work of philosophy in its time. Only later did we start calling it a work of physics, as pertaining to a field of its own. Likewise with Leibniz and Descartes’ contributions to physics.)
Re: meta-ethics, I don’t think it’s going in circles yet, but do recognize the rate at which it has produced new ideas (found genuinely new choice points) has slowed down. It’s still doing much work in collapsing false choice points though (and this seems healthy: it should over-generate and then cut down).
One thing it has completely failed to do is sell the project to the rest of the scientific community (hence why I write). But it’s also tough sell. There are various sociological obstacles at work here:
20th century ethical disasters: I think after the atrocities committed in the name of science during, during the (especially early) 20th century, scientists rightly want nothing to do with anything that smells normative. In some sense, this is a philosophical success story: awareness of the naturalistic fallacy has increased substantially. The “origins and nature of morality” probably raises a lot of alarm bells for many scientists (though, yes, I’m aware there are evolutionary biologists who explore the topic. I want to see more of this). To be clear, the wariness is warranted: this subject is indeed a normative minefield. But that doesn’t mean it can’t be crossed and that answers can’t be found. (I actually think, in the specific case of meta-ethics, part of philosophy’s contribution is to clear or at least flag the normative mines – keep the first and second order claims as distinct as possible).
Specialization: As academia has specialized, there has been less cross-departmental pollination.
Philosophy as a dirty word: I think “hard scientists” have come to associate “philosophy” (and maybe especially “ethics”) with “subjective” or something, and therefore to be avoided. Like, for many it’s just negative association at this point, with little reason attached to it. (I blame Hegel – he’s the reason philosophy got such a bad rap starting in the early 20th century).
Funding: How many governments or private funding institutions in today’s post-modern world do you expect prioritize “solving the origins and nature of morality” over other more immediately materially/economically useful or prestigious/constituent-pleasing research directions?
There are also methodological obstacles: the relevant data is just hard to collect; the number of confounding variables, myriad; the dimensionality of the systems involved, incredibly high! Compare, for example, with macroeconomics: natural experiments are extremely few and far between, and even then confounding variables abound; the timescales of the phenomena of interest (e.g. sustained recessions vs sustained growth periods) are very long, and as such we have very little data – there’ve only been a handful of such periods since record keeping began. We barely understand/can predict macro-econ any better than we did 100 years ago, and it’s not for a lack of brilliance, rigor or funding.
In the sense that I take you to be using “science” (forming a narrow hypothesis, carefully collecting pertinent data, making pretty graphs with error bars) neither of them are probably doing it well.[1] But we shouldn’t really expect them to? Like, that’s not what the discipline is good for.
I’d bet they liberally employ the usual theoretical desiderata (explanatory power, ontological parsimony, theoretical conservatism) to argue for their view, but they probably only make cursory reference to empirical studies. And until they are do refer to more empirical work, they won’t converge on an answer (or improve our predictions, if you prefer). But, again, I don’t expect them to, since I think most of the pertinent empirical work is yet to be done.
I’m not surprised you find this cheeky, but just FYI I was dead serious: that’s pretty much literally what I and many think is possibly the case.
So this is very interesting to me, and I think I agree with you on some points here, but that you’re missing others. But first I need to understand what you mean by “natural sparsity” and what your (very very rough) story is of how our words get their referents. I take it you’re drawing on ML concepts and explanations, and it sounds like a story some philosophers tell, but I’m not familiar with the lingo and want to understand this better. Please tell me more. Related: would you say that we know more about water than our 1700s counterparts, or would you just say “water” today refers to something different than what it referred to in the 1700s? In which case, what is it we’ve gained relative to them? More accurate predictions regarding… what?
Thanks, yep, I’m not sure. Whether or not there is an attractor (and how that attraction is supposed to work) seems like the major crux – certainly in our case!
One thing I want to defend and clarify: someone the other day objected that philosophers are overly confident in their proposals, overly married to them. I think I would agree in some sense, since I think their work is often in doing pre-paradigmatic work: they often jump the gun and declare victory, take philosophizing to be enough to settle a matter. Accordingly, I need to correct the following:
I should have said the field as whole is not married to any particular theory. But I’m not sure having individual researchers try so hard to develop and defend particular views is so perverse. Seems pretty normal that in trying to advance theory, individual theorists heavily favor one or another theory – the one they are curious about, want to develop, make robust and take to its limit. One shouldn’t necessarily look to one particular frontier physicist to form your best guess about their frontier – instead one should survey the various theories being advanced/developed in the area.
For posterity, we discussed in-person, and both (afaict) took the following to be clear predictive disagreements between the (paradigmatic) naturalist realists and anti-realists (condensed for brevity here, to the point of really being more of a mnemonic device):
Realists claim that:
(No Special Semantics): Our use of “right” and “wrong” are picking up, respectively, on what would be appropriately called the rightness and wrongness features in the world.
(Non-subjectivism/non-relativism): These features are largely independent of any particular homo sapiens attitudes and very stable over time.
(Still Learning): We collectively haven’t fully learned these features yet – the sparsity of the world does support and can guide further refinement of our collective usage of moral terms should we collectively wish to generalize better at identifying the presence of said features. This is the claim that leads to claims of there being a “moral attractor.”
Anti-realists may or may not disagree with (1) depending on how they cash out their semantics, but they almost certainly disagree with something like (2) and (3) (at least in their meta-ethical moments).
My read was more “do the best we can to get through the acute risk period in a way that lets humanity have the time and power to do the best it can at defining/creating a future full of value.” And that’s in response and opposed to positions like “figure out / decide what is best for humanity (or a procedure that can generate the answer to that) and use that to shape the long term future.”