I think there’s some confusion going on with “consequentialism” here, and that’s at least a part of what’s at play with “why isn’t everyone seeing the consequentialism all the time”.
One question I asked myself reading this is “does the author distinguish ‘consequentialism’ with ‘thinking and predicting’ in this piece?” and I think it’s uncertain and leaning towards ‘no’.
So, how do other people use ‘consequentialism’?
It’s sometimes put forward as a moral tradition/ethical theory, as an alternative to both deontology and virtue ethics. I forget which philosopher decided this was the trifecta but these are often compared and contrasted to each other. In particular, the version used here seems to not fit well with this article.
Another might be that consequentialism is an ethical theory that requires prediction (whereas others do not) -- I think this is an important feature of consequentialism, but it seems like ‘the set of all ethical theories which have prediction as a first class component’ is bigger than just consequentialism. I do think that ethical theories that require prediction as a first class component are important for AI alignment, specifically intent alignment (less clear if useful for non-intent-alignment alignment research).
A different angle to this would be “do common criticisms of consequentialism apply to the concept being used here”. Consequentialism has had a ton of philosophical debate over the last century (probably more?) and according to me there’s a bunch of valid criticisms.[1]
Finally I feel like this is missing a huge step in the recent history of ethical theories, which is the introduction of Moral Uncertainty. I think Moral Uncertainty is a huge step, but the miss (in this article) is a ‘near miss’. I think a similar argument could have been made for AI researchers / Alignment researchers, using the framing of Moral Uncertainty, should be updating on net in the direction of consequentialism being useful/relevant for modeling systems (and possibly useful for designing alignment tech).
I’m not certain that the criticisms will hold, but I think that proponents of consequentialism have insufficiently engaged with the criticisms; my net current take is uncertain but leaning in the consequentialists favor. (See also: Moral Uncertainty)
I’m pretty sure “consequentialism” here wasn’t meant to mean anything to do with ethics in this case (which I acknowledge as confusing)
I think consequentialism-as-ethics means “the right/moral thing to do is to choose actions that have good consequences.”
I think consequentialism as Eliezer/John meant here is more like “the thing to do is choose actions that have the consequences you want.”
A consequentialist is something that thinks, predicts, and plans (and, if possible, acts) in such a way as to bring about particular consequences.
(I think it’s plausible that we want different words for these things, but I think this use of the word consequentialism is fairly natural, and makes sense to see “moral consequentalism” as a subset of consequentialism.)
I like this one. I think it does a lot to capture both the concept and the problem.
The concept is that we expect AI systems to be convergently goal-directed.
The problem is that people in AI research often uncertain about goal-directeness and its emergence in advanced AI systems. (My attempt to paraphrase the problem of the post, in terms of goal-directedness, at least)
Nothing comes to mind as a single term, in particular because I usually think of ‘thinking’, ‘predicting’, and ‘planning’ separately.
If you’re okay with multiple terms, ‘thinking, predicting, and planning’.
Aside: now’s a great time to potentially rewrite the LW tag header on consequentialism to match this meaning/framing. (Would probably help with aligning people on this site, at least). https://www.lesswrong.com/tag/consequentialism
Yeah this seems like one way it could resolve the differences in arguments.
My guess (though I don’t know for certain) is that more AI alignment researchers would agree with “the thing to do is choose actions that have the consequences you want” is an important part of AI research, than “the right/moral thing to do is to choose actions that have good consequences” is an important part of AI research.
I’m curious how much confusion you think is left after taboo-ing the term and communicating the clarification?
I personally didn’t feel confused, so I think I mostly turn that question around to you? (i.e. it seemed natural to me to use “consequentalist” in this way, and insofar as any confusion came up, specifying ‘oh, no I didn’t mean it as an ethical theory’ seems like it should address it. But, you might disagree)
I think my personal take is basically “yeah it seems like almost everything routes through a near-consequentialist theory” and “calling this theory ‘consequentialism’ seems fair to me”.
I spend a lot of time with people that are working on AI / AI Alignment who aren’t in the rationality community, and I don’t think this is the take for all of them. In particular I imagine from the “words have meaning, dammit” camp a lot of disagreement about ‘consequentialism’ the term, but if you taboo’d it, there’s a lot of broad agreement here.
In particular, I think this belief is super common and super strong in researchers focused on aligning AGI, or otherwise focused on long-term alignment.
I do think there’s a lot of disagreement in the more near-term alignment research field.
This is why this article felt weird to me—it’s not clear that there is a super wide mistake being made, and to the extent Raemon/John think there is, there’s also a lot of people who are uncertain (again c/f moral uncertainty) even if updating in the ‘thinking/predicting’ direction.
E.g. for this bit:
I… guess what I think Eliezer thinks is that Thoughful Researcher isn’t respecting inner optimizers enough.
My take is median Thoughtful Researcher is more uncertain about inner optimizers—instead of being certain that EY is wrong here.
And pointing at another bit:
Consequentialism is a (relatively) simple, effective process for accomplishing goals, so things that efficiently optimize for goals tend to approximate it.
I think people would disagree with this as consequentialism.
It’s important to maybe point at another term that’s charged with a nontraditional meaning in this community: rationality.
We mean something closer to skeptical empiricism that the actual term, but if you taboo it I think you end up with a lot more agreement about what we’re talking about.
I think there’s some confusion going on with “consequentialism” here, and that’s at least a part of what’s at play with “why isn’t everyone seeing the consequentialism all the time”.
One question I asked myself reading this is “does the author distinguish ‘consequentialism’ with ‘thinking and predicting’ in this piece?” and I think it’s uncertain and leaning towards ‘no’.
So, how do other people use ‘consequentialism’?
It’s sometimes put forward as a moral tradition/ethical theory, as an alternative to both deontology and virtue ethics. I forget which philosopher decided this was the trifecta but these are often compared and contrasted to each other. In particular, the version used here seems to not fit well with this article.
Another might be that consequentialism is an ethical theory that requires prediction (whereas others do not) -- I think this is an important feature of consequentialism, but it seems like ‘the set of all ethical theories which have prediction as a first class component’ is bigger than just consequentialism. I do think that ethical theories that require prediction as a first class component are important for AI alignment, specifically intent alignment (less clear if useful for non-intent-alignment alignment research).
A different angle to this would be “do common criticisms of consequentialism apply to the concept being used here”. Consequentialism has had a ton of philosophical debate over the last century (probably more?) and according to me there’s a bunch of valid criticisms.[1]
Finally I feel like this is missing a huge step in the recent history of ethical theories, which is the introduction of Moral Uncertainty. I think Moral Uncertainty is a huge step, but the miss (in this article) is a ‘near miss’. I think a similar argument could have been made for AI researchers / Alignment researchers, using the framing of Moral Uncertainty, should be updating on net in the direction of consequentialism being useful/relevant for modeling systems (and possibly useful for designing alignment tech).
I’m not certain that the criticisms will hold, but I think that proponents of consequentialism have insufficiently engaged with the criticisms; my net current take is uncertain but leaning in the consequentialists favor. (See also: Moral Uncertainty)
I’m pretty sure “consequentialism” here wasn’t meant to mean anything to do with ethics in this case (which I acknowledge as confusing)
I think consequentialism-as-ethics means “the right/moral thing to do is to choose actions that have good consequences.”
I think consequentialism as Eliezer/John meant here is more like “the thing to do is choose actions that have the consequences you want.”
A consequentialist is something that thinks, predicts, and plans (and, if possible, acts) in such a way as to bring about particular consequences.
(I think it’s plausible that we want different words for these things, but I think this use of the word consequentialism is fairly natural, and makes sense to see “moral consequentalism” as a subset of consequentialism.)
Saying this again separately, if you taboo ‘consequentialism’ and take these as the definitions for a concept:
I think this is what “the majority of alignment researchers who probably are less on-the-ball” are in fact thinking about quite often.
We just don’t call it ‘consequentialism’.
does it have a name, or just a vaguely amorphous concept blob?
Goal-directed?
I like this one. I think it does a lot to capture both the concept and the problem.
The concept is that we expect AI systems to be convergently goal-directed.
The problem is that people in AI research often uncertain about goal-directeness and its emergence in advanced AI systems. (My attempt to paraphrase the problem of the post, in terms of goal-directedness, at least)
Nothing comes to mind as a single term, in particular because I usually think of ‘thinking’, ‘predicting’, and ‘planning’ separately.
If you’re okay with multiple terms, ‘thinking, predicting, and planning’.
Aside: now’s a great time to potentially rewrite the LW tag header on consequentialism to match this meaning/framing. (Would probably help with aligning people on this site, at least). https://www.lesswrong.com/tag/consequentialism
Yeah this seems like one way it could resolve the differences in arguments.
My guess (though I don’t know for certain) is that more AI alignment researchers would agree with “the thing to do is choose actions that have the consequences you want” is an important part of AI research, than “the right/moral thing to do is to choose actions that have good consequences” is an important part of AI research.
I’m curious how much confusion you think is left after taboo-ing the term and communicating the clarification?
I personally didn’t feel confused, so I think I mostly turn that question around to you? (i.e. it seemed natural to me to use “consequentalist” in this way, and insofar as any confusion came up, specifying ‘oh, no I didn’t mean it as an ethical theory’ seems like it should address it. But, you might disagree)
I think my personal take is basically “yeah it seems like almost everything routes through a near-consequentialist theory” and “calling this theory ‘consequentialism’ seems fair to me”.
I spend a lot of time with people that are working on AI / AI Alignment who aren’t in the rationality community, and I don’t think this is the take for all of them. In particular I imagine from the “words have meaning, dammit” camp a lot of disagreement about ‘consequentialism’ the term, but if you taboo’d it, there’s a lot of broad agreement here.
In particular, I think this belief is super common and super strong in researchers focused on aligning AGI, or otherwise focused on long-term alignment.
I do think there’s a lot of disagreement in the more near-term alignment research field.
This is why this article felt weird to me—it’s not clear that there is a super wide mistake being made, and to the extent Raemon/John think there is, there’s also a lot of people who are uncertain (again c/f moral uncertainty) even if updating in the ‘thinking/predicting’ direction.
E.g. for this bit:
My take is median Thoughtful Researcher is more uncertain about inner optimizers—instead of being certain that EY is wrong here.
And pointing at another bit:
I think people would disagree with this as consequentialism.
It’s important to maybe point at another term that’s charged with a nontraditional meaning in this community: rationality.
We mean something closer to skeptical empiricism that the actual term, but if you taboo it I think you end up with a lot more agreement about what we’re talking about.