RSS

Co­her­ent Ex­trap­o­lated Volition

TagLast edit: Feb 13, 2025, 9:02 PM by plex

Coherent Extrapolated Volition was a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program what we think our desires and motivations are into an AI, instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to.

Related: Friendly AI, Metaethics Sequence, Complexity of Value

In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

Often CEV is used generally to refer to what the idealized version of a person would want, separate from the context of building aligned AI’s.

What is volition?

As an example of the classical concept of volition, the author develops a simple thought experiment: imagine you’re facing two boxes, A and B. One of these boxes, and only one, has a diamond in it – box B. You are now asked to make a guess, whether to choose box A or B, and you chose to open box A. It was your decision to take box A, but your volition was to choose box B, since you wanted the diamond in the first place.

Now imagine someone else – Fred – is faced with the same task and you want to help him in his decision by giving the box he chose, box A. Since you know where the diamond is, simply handing him the box isn’t helping. As such, you mentally extrapolate a volition for Fred, based on a version of him that knows where the diamond is, and imagine he actually wants box B.

Coherent Extrapolated Volition

“The “Coherent” in “Coherent Extrapolated Volition” does not indicate the idea that an extrapolated volition is necessarily coherent. The “Coherent” part indicates the idea that if you build an FAI and run it on an extrapolated human, the FAI should only act on the coherent parts. Where there are multiple attractors, the FAI should hold satisficing avenues open, not try to decide itself.”—Eliezer Yudkowsky

In developing friendly AI, one acting for our best interests, we would have to take care that it would have implemented, from the beginning, a coherent extrapolated volition of humankind. In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

The main problems with CEV include, firstly, the great difficulty of implementing such a program—“If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004. He states that there’s a “principled distinction between discussing CEV as an initial dynamic of Friendliness, and discussing CEV as a Nice Place to Live” and his essay was essentially conflating the two definitions.

Further Reading & References

See also

Mir­rors and Paintings

Eliezer YudkowskyAug 23, 2008, 12:29 AM
29 points
42 comments8 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaleyFeb 14, 2024, 7:10 AM
40 points
12 comments31 min readLW link

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaleyFeb 1, 2024, 9:15 PM
15 points
15 comments13 min readLW link

The self-un­al­ign­ment problem

Apr 14, 2023, 12:10 PM
154 points
24 comments10 min readLW link

Is it time to start think­ing about what AI Friendli­ness means?

Victor NovikovApr 11, 2022, 9:32 AM
18 points
6 comments3 min readLW link

[Question] Is there any se­ri­ous at­tempt to cre­ate a sys­tem to figure out the CEV of hu­man­ity and if not, why haven’t we started yet?

Jonas HallgrenFeb 25, 2021, 10:06 PM
5 points
2 comments1 min readLW link

Solv­ing For Meta-Ethics By In­duc­ing From The Self

VisionaryHeraJan 20, 2023, 7:21 AM
4 points
1 comment9 min readLW link

A prob­lem with the most re­cently pub­lished ver­sion of CEV

ThomasCederborgAug 23, 2023, 6:05 PM
10 points
8 comments8 min readLW link1 review

[NSFW Re­view] In­ter­species Reviewers

lsusrApr 1, 2022, 11:09 AM
52 points
8 comments2 min readLW link

CEV: co­her­ence ver­sus extrapolation

Stuart_ArmstrongSep 22, 2014, 11:24 AM
21 points
17 comments2 min readLW link

Stanovich on CEV

lukeprogApr 29, 2012, 9:37 AM
19 points
6 comments3 min readLW link

Con­cept ex­trap­o­la­tion: key posts

Stuart_ArmstrongApr 19, 2022, 10:01 AM
13 points
2 comments1 min readLW link

CEV-in­spired models

Stuart_ArmstrongDec 7, 2011, 6:35 PM
10 points
43 comments1 min readLW link

CEV: a util­i­tar­ian critique

PabloJan 26, 2013, 4:12 PM
32 points
87 comments5 min readLW link

Hack­ing the CEV for Fun and Profit

Wei DaiJun 3, 2010, 8:30 PM
78 points
207 comments1 min readLW link

CEV-tropes

snarlesSep 22, 2014, 6:21 PM
12 points
15 comments1 min readLW link

How Would an Utopia-Max­i­mizer Look Like?

Thane RuthenisDec 20, 2023, 8:01 PM
31 points
23 comments10 min readLW link

[Question] What would the cre­ation of al­igned AGI look like for us?

PerhapsApr 8, 2022, 6:05 PM
3 points
4 comments1 min readLW link

Hu­man­ity as an en­tity: An al­ter­na­tive to Co­her­ent Ex­trap­o­lated Volition

Victor NovikovApr 22, 2022, 12:48 PM
3 points
2 comments4 min readLW link

Con­trary to List of Lethal­ity’s point 22, al­ign­ment’s door num­ber 2

False NameDec 14, 2022, 10:01 PM
−2 points
5 comments22 min readLW link

Co­her­ent ex­trap­o­lated dreaming

Alex FlintDec 26, 2022, 5:29 PM
38 points
10 comments17 min readLW link

Turn­ing Some In­con­sis­tent Prefer­ences into Con­sis­tent Ones

niplavJul 18, 2022, 6:40 PM
23 points
5 comments12 min readLW link

Prefer­ence Ag­gre­ga­tion as Bayesian Inference

berenJul 27, 2023, 5:59 PM
14 points
1 comment1 min readLW link

Align­ment: “Do what I would have wanted you to do”

Oleg TrottJul 12, 2024, 4:47 PM
11 points
48 comments1 min readLW link

[Link] FreakoS­tats and CEV

FilipeJun 6, 2012, 3:21 PM
4 points
40 comments2 min readLW link

In favour of a se­lec­tive CEV ini­tial dynamic

[deleted]Oct 21, 2011, 5:33 PM
16 points
114 comments11 min readLW link

Towards an Ethics Calcu­la­tor for Use by an AGI

sweenesmDec 12, 2023, 6:37 PM
3 points
2 comments11 min readLW link

Why the be­liefs/​val­ues di­chotomy?

Wei DaiOct 20, 2009, 4:35 PM
29 points
156 comments2 min readLW link

In­suffi­cient Values

Jun 16, 2021, 2:33 PM
31 points
16 comments6 min readLW link

Mor­pholog­i­cal in­tel­li­gence, su­per­hu­man em­pa­thy, and eth­i­cal arbitration

Roman LeventovFeb 13, 2023, 10:25 AM
1 point
0 comments2 min readLW link

So­cial Choice Ethics in Ar­tifi­cial In­tel­li­gence (pa­per challeng­ing CEV-like ap­proaches to choos­ing an AI’s val­ues)

Kaj_SotalaOct 3, 2017, 5:39 PM
3 points
0 comments1 min readLW link
(papers.ssrn.com)

Cog­ni­tive Neu­ro­science, Ar­row’s Im­pos­si­bil­ity The­o­rem, and Co­her­ent Ex­trap­o­lated Volition

lukeprogSep 25, 2011, 11:15 AM
26 points
18 comments1 min readLW link

Up­date on Devel­op­ing an Ethics Calcu­la­tor to Align an AGI to

sweenesmMar 12, 2024, 12:33 PM
4 points
2 comments8 min readLW link

After Align­ment — Dialogue be­tween RogerDear­naley and Seth Herd

Dec 2, 2023, 6:03 AM
15 points
2 comments25 min readLW link

[Question] Can co­her­ent ex­trap­o­lated vo­li­tion be es­ti­mated with In­verse Re­in­force­ment Learn­ing?

Jade BishopApr 15, 2019, 3:23 AM
12 points
5 comments3 min readLW link

Open-ended ethics of phe­nom­ena (a desider­ata with uni­ver­sal moral­ity)

Ryo Nov 8, 2023, 8:10 PM
1 point
0 comments8 min readLW link

The for­mal goal is a pointer

MorphismMay 1, 2024, 12:27 AM
20 points
10 comments1 min readLW link

Re­cur­sion in AI is scary. But let’s talk solu­tions.

Oleg TrottJul 16, 2024, 8:34 PM
3 points
10 comments2 min readLW link

A prob­lem shared by many differ­ent al­ign­ment targets

ThomasCederborgJan 15, 2025, 2:22 PM
12 points
15 comments36 min readLW link

Con­sti­tu­tions for ASI?

ukc10014Jan 28, 2025, 4:32 PM
3 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

The ‘anti woke’ are po­si­tioned to win but can they cap­i­tal­ize?

HznJan 21, 2025, 9:52 AM
−8 points
0 comments2 min readLW link

Scien­tism vs. people

Roman LeventovApr 18, 2023, 5:28 PM
4 points
4 comments11 min readLW link

​​ Open-ended/​Phenom­e­nal ​Ethics ​(TLDR)

Ryo Nov 9, 2023, 4:58 PM
3 points
0 comments1 min readLW link

Op­tion­al­ity ap­proach to ethics

Ryo Nov 13, 2023, 3:23 PM
7 points
2 comments3 min readLW link

Why small phe­nomenons are rele­vant to moral­ity ​

Ryo Nov 13, 2023, 3:25 PM
1 point
0 comments3 min readLW link

Tak­ing Into Ac­count Sen­tient Non-Hu­mans in AI Am­bi­tious Value Learn­ing: Sen­tien­tist Co­her­ent Ex­trap­o­lated Volition

Adrià MoretDec 2, 2023, 2:07 PM
26 points
31 comments42 min readLW link

In­fer­ence from a Math­e­mat­i­cal De­scrip­tion of an Ex­ist­ing Align­ment Re­search: a pro­posal for an outer al­ign­ment re­search program

Christopher KingJun 2, 2023, 9:54 PM
7 points
4 comments16 min readLW link

Philo­soph­i­cal Cy­borg (Part 1)

Jun 14, 2023, 4:20 PM
31 points
4 comments13 min readLW link

Su­per­in­tel­li­gence 23: Co­her­ent ex­trap­o­lated volition

KatjaGraceFeb 17, 2015, 2:00 AM
15 points
98 comments7 min readLW link

Harsanyi’s So­cial Ag­gre­ga­tion The­o­rem and what it means for CEV

AlexMennenJan 5, 2013, 9:38 PM
37 points
90 comments4 min readLW link

Ideal Ad­vi­sor The­o­ries and Per­sonal CEV

lukeprogDec 25, 2012, 1:04 PM
35 points
35 comments10 min readLW link

Begin­ning re­sources for CEV research

lukeprogMay 7, 2011, 5:28 AM
21 points
32 comments2 min readLW link

Topics to dis­cuss CEV

diegocaleiroJul 6, 2011, 2:19 PM
8 points
13 comments2 min readLW link
No comments.