I’m pretty sure “consequentialism” here wasn’t meant to mean anything to do with ethics in this case (which I acknowledge as confusing)
I think consequentialism-as-ethics means “the right/moral thing to do is to choose actions that have good consequences.”
I think consequentialism as Eliezer/John meant here is more like “the thing to do is choose actions that have the consequences you want.”
A consequentialist is something that thinks, predicts, and plans (and, if possible, acts) in such a way as to bring about particular consequences.
(I think it’s plausible that we want different words for these things, but I think this use of the word consequentialism is fairly natural, and makes sense to see “moral consequentalism” as a subset of consequentialism.)
I like this one. I think it does a lot to capture both the concept and the problem.
The concept is that we expect AI systems to be convergently goal-directed.
The problem is that people in AI research often uncertain about goal-directeness and its emergence in advanced AI systems. (My attempt to paraphrase the problem of the post, in terms of goal-directedness, at least)
Nothing comes to mind as a single term, in particular because I usually think of ‘thinking’, ‘predicting’, and ‘planning’ separately.
If you’re okay with multiple terms, ‘thinking, predicting, and planning’.
Aside: now’s a great time to potentially rewrite the LW tag header on consequentialism to match this meaning/framing. (Would probably help with aligning people on this site, at least). https://www.lesswrong.com/tag/consequentialism
Yeah this seems like one way it could resolve the differences in arguments.
My guess (though I don’t know for certain) is that more AI alignment researchers would agree with “the thing to do is choose actions that have the consequences you want” is an important part of AI research, than “the right/moral thing to do is to choose actions that have good consequences” is an important part of AI research.
I’m curious how much confusion you think is left after taboo-ing the term and communicating the clarification?
I personally didn’t feel confused, so I think I mostly turn that question around to you? (i.e. it seemed natural to me to use “consequentalist” in this way, and insofar as any confusion came up, specifying ‘oh, no I didn’t mean it as an ethical theory’ seems like it should address it. But, you might disagree)
I think my personal take is basically “yeah it seems like almost everything routes through a near-consequentialist theory” and “calling this theory ‘consequentialism’ seems fair to me”.
I spend a lot of time with people that are working on AI / AI Alignment who aren’t in the rationality community, and I don’t think this is the take for all of them. In particular I imagine from the “words have meaning, dammit” camp a lot of disagreement about ‘consequentialism’ the term, but if you taboo’d it, there’s a lot of broad agreement here.
In particular, I think this belief is super common and super strong in researchers focused on aligning AGI, or otherwise focused on long-term alignment.
I do think there’s a lot of disagreement in the more near-term alignment research field.
This is why this article felt weird to me—it’s not clear that there is a super wide mistake being made, and to the extent Raemon/John think there is, there’s also a lot of people who are uncertain (again c/f moral uncertainty) even if updating in the ‘thinking/predicting’ direction.
E.g. for this bit:
I… guess what I think Eliezer thinks is that Thoughful Researcher isn’t respecting inner optimizers enough.
My take is median Thoughtful Researcher is more uncertain about inner optimizers—instead of being certain that EY is wrong here.
And pointing at another bit:
Consequentialism is a (relatively) simple, effective process for accomplishing goals, so things that efficiently optimize for goals tend to approximate it.
I think people would disagree with this as consequentialism.
It’s important to maybe point at another term that’s charged with a nontraditional meaning in this community: rationality.
We mean something closer to skeptical empiricism that the actual term, but if you taboo it I think you end up with a lot more agreement about what we’re talking about.
I’m pretty sure “consequentialism” here wasn’t meant to mean anything to do with ethics in this case (which I acknowledge as confusing)
I think consequentialism-as-ethics means “the right/moral thing to do is to choose actions that have good consequences.”
I think consequentialism as Eliezer/John meant here is more like “the thing to do is choose actions that have the consequences you want.”
A consequentialist is something that thinks, predicts, and plans (and, if possible, acts) in such a way as to bring about particular consequences.
(I think it’s plausible that we want different words for these things, but I think this use of the word consequentialism is fairly natural, and makes sense to see “moral consequentalism” as a subset of consequentialism.)
Saying this again separately, if you taboo ‘consequentialism’ and take these as the definitions for a concept:
I think this is what “the majority of alignment researchers who probably are less on-the-ball” are in fact thinking about quite often.
We just don’t call it ‘consequentialism’.
does it have a name, or just a vaguely amorphous concept blob?
Goal-directed?
I like this one. I think it does a lot to capture both the concept and the problem.
The concept is that we expect AI systems to be convergently goal-directed.
The problem is that people in AI research often uncertain about goal-directeness and its emergence in advanced AI systems. (My attempt to paraphrase the problem of the post, in terms of goal-directedness, at least)
Nothing comes to mind as a single term, in particular because I usually think of ‘thinking’, ‘predicting’, and ‘planning’ separately.
If you’re okay with multiple terms, ‘thinking, predicting, and planning’.
Aside: now’s a great time to potentially rewrite the LW tag header on consequentialism to match this meaning/framing. (Would probably help with aligning people on this site, at least). https://www.lesswrong.com/tag/consequentialism
Yeah this seems like one way it could resolve the differences in arguments.
My guess (though I don’t know for certain) is that more AI alignment researchers would agree with “the thing to do is choose actions that have the consequences you want” is an important part of AI research, than “the right/moral thing to do is to choose actions that have good consequences” is an important part of AI research.
I’m curious how much confusion you think is left after taboo-ing the term and communicating the clarification?
I personally didn’t feel confused, so I think I mostly turn that question around to you? (i.e. it seemed natural to me to use “consequentalist” in this way, and insofar as any confusion came up, specifying ‘oh, no I didn’t mean it as an ethical theory’ seems like it should address it. But, you might disagree)
I think my personal take is basically “yeah it seems like almost everything routes through a near-consequentialist theory” and “calling this theory ‘consequentialism’ seems fair to me”.
I spend a lot of time with people that are working on AI / AI Alignment who aren’t in the rationality community, and I don’t think this is the take for all of them. In particular I imagine from the “words have meaning, dammit” camp a lot of disagreement about ‘consequentialism’ the term, but if you taboo’d it, there’s a lot of broad agreement here.
In particular, I think this belief is super common and super strong in researchers focused on aligning AGI, or otherwise focused on long-term alignment.
I do think there’s a lot of disagreement in the more near-term alignment research field.
This is why this article felt weird to me—it’s not clear that there is a super wide mistake being made, and to the extent Raemon/John think there is, there’s also a lot of people who are uncertain (again c/f moral uncertainty) even if updating in the ‘thinking/predicting’ direction.
E.g. for this bit:
My take is median Thoughtful Researcher is more uncertain about inner optimizers—instead of being certain that EY is wrong here.
And pointing at another bit:
I think people would disagree with this as consequentialism.
It’s important to maybe point at another term that’s charged with a nontraditional meaning in this community: rationality.
We mean something closer to skeptical empiricism that the actual term, but if you taboo it I think you end up with a lot more agreement about what we’re talking about.