Coherent Extrapolated Volition

TagLast edit: Feb 13, 2025, 9:02 PM by plex

Coherent Extrapolated Volition was a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program what we think our desires and motivations are into an AI, instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to.

In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

Often CEV is used generally to refer to what the idealized version of a person would want, separate from the context of building aligned AI’s.

What is volition?

As an example of the classical concept of volition, the author develops a simple thought experiment: imagine you’re facing two boxes, A and B. One of these boxes, and only one, has a diamond in it – box B. You are now asked to make a guess, whether to choose box A or B, and you chose to open box A. It was your decision to take box A, but your volition was to choose box B, since you wanted the diamond in the first place.

Now imagine someone else – Fred – is faced with the same task and you want to help him in his decision by giving the box he chose, box A. Since you know where the diamond is, simply handing him the box isn’t helping. As such, you mentally extrapolate a volition for Fred, based on a version of him that knows where the diamond is, and imagine he actually wants box B.

Coherent Extrapolated Volition

“The “Coherent” in “Coherent Extrapolated Volition” does not indicate the idea that an extrapolated volition is necessarily coherent. The “Coherent” part indicates the idea that if you build an FAI and run it on an extrapolated human, the FAI should only act on the coherent parts. Where there are multiple attractors, the FAI should hold satisficing avenues open, not try to decide itself.”—Eliezer Yudkowsky

In developing friendly AI, one acting for our best interests, we would have to take care that it would have implemented, from the beginning, a coherent extrapolated volition of humankind. In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

The main problems with CEV include, firstly, the great difficulty of implementing such a program—“If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004. He states that there’s a “principled distinction between discussing CEV as an initial dynamic of Friendliness, and discussing CEV as a Nice Place to Live” and his essay was essentially conflating the two definitions.

Mirrors and Paintings

Eliezer YudkowskyAug 23, 2008, 12:29 AM

29 points

42 comments8 min readLW link

Requirements for a Basin of Attraction to Alignment

RogerDearnaleyFeb 14, 2024, 7:10 AM

41 points

12 comments31 min readLW link

The self-unalignment problem

Jan_Kulveit and rosehadshar

Apr 14, 2023, 12:10 PM

155 points

24 comments10 min readLW link

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaleyFeb 1, 2024, 9:15 PM

16 points

15 comments13 min readLW link

[Question] Is there any serious attempt to create a system to figure out the CEV of humanity and if not, why haven’t we started yet?

Jonas HallgrenFeb 25, 2021, 10:06 PM

5 points

2 comments1 min readLW link

Is it time to start thinking about what AI Friendliness means?

Victor NovikovApr 11, 2022, 9:32 AM

18 points

6 comments3 min readLW link

CEV: a utilitarian critique

PabloJan 26, 2013, 4:12 PM

34 points

89 comments5 min readLW link

Solving For Meta-Ethics By Inducing From The Self

VisionaryHeraJan 20, 2023, 7:21 AM

4 points

1 comment9 min readLW link

CEV-tropes

snarlesSep 22, 2014, 6:21 PM

12 points

15 comments1 min readLW link

CEV: coherence versus extrapolation

Stuart_ArmstrongSep 22, 2014, 11:24 AM

21 points

17 comments2 min readLW link

Stanovich on CEV

lukeprogApr 29, 2012, 9:37 AM

19 points

6 comments3 min readLW link

Hacking the CEV for Fun and Profit

Wei DaiJun 3, 2010, 8:30 PM

80 points

207 comments1 min readLW link

Concept extrapolation: key posts

Stuart_ArmstrongApr 19, 2022, 10:01 AM

13 points

2 comments1 min readLW link

[NSFW Review] Interspecies Reviewers

lsusrApr 1, 2022, 11:09 AM

51 points

8 comments2 min readLW link

CEV-inspired models

Stuart_ArmstrongDec 7, 2011, 6:35 PM

10 points

43 comments1 min readLW link

A problem with the most recently published version of CEV

ThomasCederborgAug 23, 2023, 6:05 PM

10 points

8 comments8 min readLW link 1 review

[Question] What would the creation of aligned AGI look like for us?

PerhapsApr 8, 2022, 6:05 PM

3 points

4 comments1 min readLW link

How Would an Utopia-Maximizer Look Like?

Thane RuthenisDec 20, 2023, 8:01 PM

32 points

23 comments10 min readLW link

Preference Aggregation as Bayesian Inference

berenJul 27, 2023, 5:59 PM

14 points

1 comment1 min readLW link

Coherent extrapolated dreaming

Alex FlintDec 26, 2022, 5:29 PM

38 points

10 comments17 min readLW link

Turning Some Inconsistent Preferences into Consistent Ones

niplavJul 18, 2022, 6:40 PM

23 points

5 comments12 min readLW link

Contrary to List of Lethality’s point 22, alignment’s door number 2

False NameDec 14, 2022, 10:01 PM

−2 points

5 comments22 min readLW link

Alignment: “Do what I would have wanted you to do”

Oleg TrottJul 12, 2024, 4:47 PM

11 points

48 comments1 min readLW link

Humanity as an entity: An alternative to Coherent Extrapolated Volition

Victor NovikovApr 22, 2022, 12:48 PM

3 points

2 comments4 min readLW link

Optionality approach to ethics

Ryo Nov 13, 2023, 3:23 PM

7 points

2 comments3 min readLW link

After Alignment — Dialogue between RogerDearnaley and Seth Herd

RogerDearnaley and Seth Herd

Dec 2, 2023, 6:03 AM

15 points

2 comments25 min readLW link

The case against “The case against AI alignment”

KvmanThinkingMar 19, 2025, 10:40 PM

2 points

0 comments1 min readLW link

Superintelligence 23: Coherent extrapolated volition

KatjaGraceFeb 17, 2015, 2:00 AM

15 points

98 comments7 min readLW link

Open-ended ethics of phenomena (a desiderata with universal morality)

Ryo Nov 8, 2023, 8:10 PM

1 point

0 comments8 min readLW link

Beginning resources for CEV research

lukeprogMay 7, 2011, 5:28 AM

21 points

32 comments2 min readLW link

The ‘anti woke’ are positioned to win but can they capitalize?

HznJan 21, 2025, 9:52 AM

−8 points

0 comments2 min readLW link

Is Optimal Reflection Competitive with Extinction Risk Reduction? - Requesting Reviewers

Jordan ArelJun 29, 2025, 6:42 PM

7 points

0 comments11 min readLW link

Harsanyi’s Social Aggregation Theorem and what it means for CEV

AlexMennenJan 5, 2013, 9:38 PM

37 points

90 comments4 min readLW link

Morphological intelligence, superhuman empathy, and ethical arbitration

Roman LeventovFeb 13, 2023, 10:25 AM

1 point

0 comments2 min readLW link

Towards an Ethics Calculator for Use by an AGI

sweenesmDec 12, 2023, 6:37 PM

3 points

2 comments11 min readLW link

A problem shared by many different alignment targets

ThomasCederborgJan 15, 2025, 2:22 PM

12 points

18 comments36 min readLW link

Open-ended/Phenomenal Ethics (TLDR)

Ryo Nov 9, 2023, 4:58 PM

3 points

0 comments1 min readLW link

Philosophical Cyborg (Part 1)

ukc10014, Roman Leventov and NicholasKees

Jun 14, 2023, 4:20 PM

31 points

4 comments13 min readLW link

What If Alignment Wasn’t About Obedience?

fdescamps49935@gmail.comJun 25, 2025, 8:04 PM

1 point

0 comments2 min readLW link

AI-2027 Response: Inter-AI Tensions, Value Distillation, US Multipolarity, & More

Gatlen CulpJun 10, 2025, 6:17 PM

3 points

0 comments8 min readLW link

(gatlen.blog)

Topics to discuss CEV

diegocaleiroJul 6, 2011, 2:19 PM

8 points

13 comments2 min readLW link

Constitutions for ASI?

ukc10014Jan 28, 2025, 4:32 PM

3 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

Why the beliefs/values dichotomy?

Wei DaiOct 20, 2009, 4:35 PM

29 points

156 comments2 min readLW link

In favour of a selective CEV initial dynamic

[deleted]Oct 21, 2011, 5:33 PM

16 points

114 comments11 min readLW link

Insufficient Values

Jozdien, Jacob Abraham and Abraham Francis

Jun 16, 2021, 2:33 PM

31 points

16 comments6 min readLW link

Inference from a Mathematical Description of an Existing Alignment Research: a proposal for an outer alignment research program

Christopher KingJun 2, 2023, 9:54 PM

7 points

4 comments16 min readLW link

[Question] To what extent is AI safety work trying to get AI to reliably and safely do what the user asks vs. do what is best in some ultimate sense?

Jordan ArelMay 23, 2025, 9:05 PM

14 points

3 comments1 min readLW link

Update on Developing an Ethics Calculator to Align an AGI to

sweenesmMar 12, 2024, 12:33 PM

4 points

2 comments8 min readLW link

Cognitive Neuroscience, Arrow’s Impossibility Theorem, and Coherent Extrapolated Volition

lukeprogSep 25, 2011, 11:15 AM

26 points

18 comments1 min readLW link

[Link] FreakoStats and CEV

FilipeJun 6, 2012, 3:21 PM

4 points

40 comments2 min readLW link

Why small phenomenons are relevant to morality

Ryo Nov 13, 2023, 3:25 PM

1 point

0 comments3 min readLW link

Ideal Advisor Theories and Personal CEV

lukeprogDec 25, 2012, 1:04 PM

35 points

35 comments10 min readLW link

Recursion in AI is scary. But let’s talk solutions.

Oleg TrottJul 16, 2024, 8:34 PM

3 points

10 comments2 min readLW link

Scientism vs. people

Roman LeventovApr 18, 2023, 5:28 PM

4 points

4 comments11 min readLW link

[Question] Can coherent extrapolated volition be estimated with Inverse Reinforcement Learning?

Jade BishopApr 15, 2019, 3:23 AM

12 points

5 comments3 min readLW link

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition

Adrià MoretDec 2, 2023, 2:07 PM

26 points

31 comments42 min readLW link

The formal goal is a pointer

MorphismMay 1, 2024, 12:27 AM

20 points

10 comments1 min readLW link

Social Choice Ethics in Artificial Intelligence (paper challenging CEV-like approaches to choosing an AI’s values)

Kaj_SotalaOct 3, 2017, 5:39 PM

3 points

0 comments1 min readLW link

(papers.ssrn.com)

No comments.

Co­her­ent Ex­trap­o­lated Volition

What is volition?

Coherent Extrapolated Volition

Further Reading & References

See also

Coherent Extrapolated Volition