RSS

Com­plex­ity of Value

TagLast edit: 30 Dec 2024 9:38 UTC by Dakara

Complexity of Value is the thesis that human values have high Kolmogorov complexity; that our preferences, the things we care about, cannot be summed by a few simple rules, or compressed. Fragility of value is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable (just like dialing nine out of ten phone digits correctly does not connect you to a person 90% similar to your friend). For example, all of our values except novelty might yield a future full of individuals replaying only one optimal experience through all eternity.

Related: Ethics & Metaethics, Fun Theory, Preference, Wireheading

Many human choices can be compressed, by representing them by simple rules—the desire to survive produces innumerable actions and subgoals as we fulfill that desire. But people don’t just want to survive—although you can compress many human activities to that desire, you cannot compress all of human existence into it. The human equivalents of a utility function, our terminal values, contain many different elements that are not strictly reducible to one another. William Frankena offered this list of things which many cultures and people seem to value (for their own sake rather than strictly for their external consequences):

Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one’s own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc.

The “etc.” at the end is the tricky part, because there may be a great many values not included on this list.

One hypothesis is that natural selection reifies selection pressures as psychological drives, which then continue to execute independently of any consequentialist reasoning in the organism. This may also continue without that organism explicitly representing, let alone caring about, the original evolutionary context. Under this view, we have no reason to expect these terminal values to be reducible to any one thing, or each other.

Taken in conjunction with another LessWrong claim, that all values are morally relevant, this would suggest that those philosophers who seek to do so are mistaken in trying to find cognitively tractable overarching principles of ethics. However, it is coherent to suppose that not all values are morally relevant, and that the morally relevant ones form a tractable subset.

Complexity of value also runs into underappreciation in the presence of bad metaethics. The local flavor of metaethics could be characterized as cognitivist, without implying “thick” notions of instrumental rationality; in other words, moral discourse can be about a coherent subject matter, without all possible minds and agents necessarily finding truths about that subject matter to be psychologically compelling. An expected paperclip maximizer doesn’t disagree with you about morality any more than you disagree with it about “which action leads to the greatest number of expected paperclips”, it is just constructed to find the latter subject matter psychologically compelling but not the former. Failure to appreciate that “But it’s just paperclips! What a dumb goal! No sufficiently intelligent agent would pick such a dumb goal!” is a judgment carried out on a local brain that evaluates paperclips as inherently low-in-the-preference-ordering means that someone will expect all moral judgments to be automatically reproduced in a sufficiently intelligent agent, since, after all, they would not lack the intelligence to see that paperclips are so obviously inherently-low-in-the-preference-ordering. This is a particularly subtle species of anthropomorphism and mind projection fallacy.

Because the human brain very often fails to grasp all these difficulties involving our values, we tend to think building an awesome future is much less problematic than it really is. Fragility of value is relevant for building Friendly AI, because an AGI which does not respect human values is likely to create a world that we would consider devoid of value—not necessarily full of explicit attempts to be evil, but perhaps just a dull, boring loss.

As values are orthogonal with intelligence, they can freely vary no matter how intelligent and efficient an AGI is [1]. Since human /​ humane values have high Kolmogorov complexity, a random AGI is highly unlikely to maximize human /​ humane values. The fragility of value thesis implies that a poorly constructed AGI might e.g. turn us into blobs of perpetual orgasm. Because of this relevance the complexity and fragility of value is a major theme of Eliezer Yudkowsky’s writings.

Wrongly designing the future because we wrongly encoded human values is a serious and difficult to assess type of Existential risk. “Touch too hard in the wrong dimension, and the physical representation of those values will shatter—and not come back, for there will be nothing left to want to bring it back. And the referent of those values—a worthwhile universe—would no longer have any physical reason to come into being. Let go of the steering wheel, and the Future crashes.” [2]

Complexity of Value and AI

Complexity of value poses a problem for AI alignment. If you can’t easily compress what humans want into a simple function that can be fed into a computer, it isn’t easy to make a powerful AI that does things humans want and doesn’t do things humans don’t want. Value Learning attempts to address this problem.

Major posts

Other posts

See also

The Hid­den Com­plex­ity of Wishes

Eliezer Yudkowsky24 Nov 2007 0:12 UTC
176 points
196 comments8 min readLW link

Shard The­ory: An Overview

David Udell11 Aug 2022 5:44 UTC
166 points
34 comments10 min readLW link

Re­view of ‘But ex­actly how com­plex and frag­ile?’

TurnTrout6 Jan 2021 18:39 UTC
57 points
0 comments8 min readLW link

You don’t know how bad most things are nor pre­cisely how they’re bad.

Solenoid_Entity4 Aug 2024 14:12 UTC
317 points
48 comments5 min readLW link

An even deeper atheism

Joe Carlsmith11 Jan 2024 17:28 UTC
125 points
47 comments15 min readLW link

But ex­actly how com­plex and frag­ile?

KatjaGrace3 Nov 2019 18:20 UTC
87 points
32 comments3 min readLW link1 review
(meteuphoric.com)

Value is Fragile

Eliezer Yudkowsky29 Jan 2009 8:46 UTC
170 points
108 comments6 min readLW link

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John Nay21 Oct 2022 2:03 UTC
5 points
18 comments54 min readLW link

The Poin­ter Re­s­olu­tion Problem

Jozdien16 Feb 2024 21:25 UTC
41 points
20 comments3 min readLW link

Ag­grega­tive Prin­ci­ples of So­cial Justice

Cleo Nardo5 Jun 2024 13:44 UTC
29 points
10 comments37 min readLW link

Ter­mi­nal Values and In­stru­men­tal Values

Eliezer Yudkowsky15 Nov 2007 7:56 UTC
115 points
45 comments10 min readLW link

Con­flat­ing value al­ign­ment and in­tent al­ign­ment is caus­ing confusion

Seth Herd5 Sep 2024 16:39 UTC
48 points
18 comments5 min readLW link

In­tent al­ign­ment as a step­ping-stone to value alignment

Seth Herd5 Nov 2024 20:43 UTC
35 points
4 comments3 min readLW link

Why Do We En­gage in Mo­ral Sim­plifi­ca­tion?

Wei Dai14 Feb 2011 1:16 UTC
34 points
36 comments2 min readLW link

Com­plex­ity of Value ≠ Com­plex­ity of Outcome

Wei Dai30 Jan 2010 2:50 UTC
65 points
223 comments3 min readLW link

Alan Carter on the Com­plex­ity of Value

Ghatanathoah10 May 2012 7:23 UTC
47 points
41 comments7 min readLW link

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

Richard_Ngo21 Jan 2019 12:41 UTC
133 points
23 comments8 min readLW link

Have you felt ex­iert yet?

Stuart_Armstrong5 Jan 2018 17:03 UTC
28 points
7 comments1 min readLW link

31 Laws of Fun

Eliezer Yudkowsky26 Jan 2009 10:13 UTC
100 points
36 comments8 min readLW link

The two-layer model of hu­man val­ues, and prob­lems with syn­the­siz­ing preferences

Kaj_Sotala24 Jan 2020 15:17 UTC
70 points
16 comments9 min readLW link

The Gift We Give To Tomorrow

Eliezer Yudkowsky17 Jul 2008 6:07 UTC
150 points
100 comments8 min readLW link

High Challenge

Eliezer Yudkowsky19 Dec 2008 0:51 UTC
72 points
75 comments4 min readLW link

Our val­ues are un­der­defined, change­able, and manipulable

Stuart_Armstrong2 Nov 2017 11:09 UTC
26 points
6 comments3 min readLW link

Rev­ersible changes: con­sider a bucket of water

Stuart_Armstrong26 Aug 2019 22:55 UTC
25 points
18 comments2 min readLW link

Would I think for ten thou­sand years?

Stuart_Armstrong11 Feb 2019 19:37 UTC
25 points
13 comments1 min readLW link

Beyond al­gorith­mic equiv­alence: self-modelling

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
3 comments1 min readLW link

Bias in ra­tio­nal­ity is much worse than noise

Stuart_Armstrong31 Oct 2017 11:57 UTC
11 points
0 comments2 min readLW link

2012 Robin Han­son com­ment on “In­tel­li­gence Ex­plo­sion: Ev­i­dence and Im­port”

Rob Bensinger2 Apr 2021 16:26 UTC
28 points
4 comments3 min readLW link

Notes on Moder­a­tion, Balance, & Harmony

David Gross25 Dec 2020 2:44 UTC
9 points
1 comment7 min readLW link

Gen­eral al­ign­ment properties

TurnTrout8 Aug 2022 23:40 UTC
50 points
2 comments1 min readLW link

Align­ment al­lows “non­ro­bust” de­ci­sion-in­fluences and doesn’t re­quire ro­bust grading

TurnTrout29 Nov 2022 6:23 UTC
62 points
41 comments15 min readLW link

Notes on Caution

David Gross1 Dec 2022 3:05 UTC
14 points
0 comments19 min readLW link

Con­tent gen­er­a­tion. Where do we draw the line?

Q Home9 Aug 2022 10:51 UTC
6 points
7 comments2 min readLW link

Siren wor­lds and the per­ils of over-op­ti­mised search

Stuart_Armstrong7 Apr 2014 11:00 UTC
83 points
418 comments7 min readLW link

Bore­dom vs. Scope Insensitivity

Wei Dai24 Sep 2009 11:45 UTC
56 points
40 comments3 min readLW link

Fake Utility Functions

Eliezer Yudkowsky6 Dec 2007 16:55 UTC
69 points
63 comments4 min readLW link

Broad Pic­ture of Hu­man Values

Thane Ruthenis20 Aug 2022 19:42 UTC
42 points
6 comments10 min readLW link

What AI Safety Re­searchers Have Writ­ten About the Na­ture of Hu­man Values

avturchin16 Jan 2019 13:59 UTC
52 points
3 comments15 min readLW link

A (para­con­sis­tent) logic to deal with in­con­sis­tent preferences

B Jacobs14 Jul 2024 11:17 UTC
6 points
2 comments4 min readLW link
(bobjacobs.substack.com)

Leaky Generalizations

Eliezer Yudkowsky22 Nov 2007 21:16 UTC
59 points
31 comments3 min readLW link

Sym­pa­thetic Minds

Eliezer Yudkowsky19 Jan 2009 9:31 UTC
69 points
27 comments5 min readLW link

In Praise of Boredom

Eliezer Yudkowsky18 Jan 2009 9:03 UTC
42 points
104 comments6 min readLW link

Values Weren’t Com­plex, Once.

Davidmanheim25 Nov 2018 9:17 UTC
36 points
13 comments2 min readLW link

[Question] “Frag­ility of Value” vs. LLMs

Not Relevant13 Apr 2022 2:02 UTC
34 points
33 comments1 min readLW link

Post Your Utility Function

taw4 Jun 2009 5:05 UTC
39 points
280 comments1 min readLW link

Value For­ma­tion: An Over­ar­ch­ing Model

Thane Ruthenis15 Nov 2022 17:16 UTC
34 points
20 comments34 min readLW link

Can’t Un­birth a Child

Eliezer Yudkowsky28 Dec 2008 17:00 UTC
54 points
96 comments3 min readLW link

ISO: Name of Problem

johnswentworth24 Jul 2018 17:15 UTC
28 points
15 comments1 min readLW link

Open-ended ethics of phe­nom­ena (a desider­ata with uni­ver­sal moral­ity)

Ryo 8 Nov 2023 20:10 UTC
1 point
0 comments8 min readLW link

[Question] [DISC] Are Values Ro­bust?

DragonGod21 Dec 2022 1:00 UTC
12 points
9 comments2 min readLW link

Is check­ing that a state of the world is not dystopian eas­ier than con­struct­ing a non-dystopian state?

No77e27 Dec 2022 20:57 UTC
5 points
3 comments1 min readLW link

Su­per­in­tel­li­gence 20: The value-load­ing problem

KatjaGrace27 Jan 2015 2:00 UTC
8 points
21 comments6 min readLW link

An­thro­po­mor­phic Optimism

Eliezer Yudkowsky4 Aug 2008 20:17 UTC
81 points
60 comments5 min readLW link

Fun­da­men­tally Fuzzy Con­cepts Can’t Have Crisp Defi­ni­tions: Co­op­er­a­tion and Align­ment vs Math and Physics

VojtaKovarik21 Jul 2023 21:03 UTC
12 points
18 comments3 min readLW link

The cone of free­dom (or, free­dom might only be in­stru­men­tally valuable)

dkl924 Jul 2023 15:38 UTC
−10 points
6 comments2 min readLW link
(dkl9.net)

Fad­ing Novelty

lifelonglearner25 Jul 2018 21:36 UTC
26 points
2 comments6 min readLW link

[Question] (Thought ex­per­i­ment) If you had to choose, which would you pre­fer?

kuira17 Aug 2023 0:57 UTC
9 points
2 comments1 min readLW link

The Hid­den Com­plex­ity of Wishes—The Animation

Writer27 Sep 2023 17:59 UTC
33 points
0 comments1 min readLW link
(youtu.be)

Eval­u­at­ing the his­tor­i­cal value mis­speci­fi­ca­tion argument

Matthew Barnett5 Oct 2023 18:34 UTC
177 points
153 comments7 min readLW link2 reviews

Just How Hard a Prob­lem is Align­ment?

Roger Dearnaley25 Feb 2023 9:00 UTC
1 point
1 comment21 min readLW link

The Me­taethics and Nor­ma­tive Ethics of AGI Value Align­ment: Many Ques­tions, Some Implications

Eleos Arete Citrini16 Sep 2021 16:13 UTC
6 points
0 comments8 min readLW link

Beyond the hu­man train­ing dis­tri­bu­tion: would the AI CEO cre­ate al­most-ille­gal ted­dies?

Stuart_Armstrong18 Oct 2021 21:10 UTC
36 points
2 comments3 min readLW link

Don’t want Good­hart? — Spec­ify the vari­ables more

YanLyutnev21 Nov 2024 22:43 UTC
3 points
2 comments5 min readLW link

Can there be an in­de­scrib­able hel­l­world?

Stuart_Armstrong29 Jan 2019 15:00 UTC
39 points
19 comments2 min readLW link

What’s wrong with sim­plic­ity of value?

Wei Dai27 Jul 2011 3:09 UTC
29 points
40 comments1 min readLW link

Don’t want Good­hart? — Spec­ify the damn variables

Yan Lyutnev21 Nov 2024 22:45 UTC
−3 points
2 comments5 min readLW link

Safe AGI Com­plex­ity: Guess­ing a Higher-Order Alge­braic Number

Sven Nilsen10 Apr 2023 12:57 UTC
−2 points
0 comments2 min readLW link

For al­ign­ment, we should si­mul­ta­neously use mul­ti­ple the­o­ries of cog­ni­tion and value

Roman Leventov24 Apr 2023 10:37 UTC
23 points
5 comments5 min readLW link

[Question] Your Preferences

PeterL5 Jan 2022 18:49 UTC
1 point
4 comments1 min readLW link

Where Utopias Go Wrong, or: The Four Lit­tle Planets

ExCeph27 May 2022 1:24 UTC
15 points
0 comments11 min readLW link
(ginnungagapfoundation.wordpress.com)

Value Plu­ral­ism and AI

Göran Crafte19 Mar 2023 23:38 UTC
8 points
4 comments2 min readLW link

Se­quence overview: Welfare and moral weights

MichaelStJules15 Aug 2024 4:22 UTC
7 points
0 comments1 min readLW link

An at­tempt to un­der­stand the Com­plex­ity of Values

Dalton Mabery5 Aug 2022 4:43 UTC
3 points
0 comments5 min readLW link

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei Dai16 Dec 2018 22:13 UTC
102 points
25 comments2 min readLW link

The E-Coli Test for AI Alignment

johnswentworth16 Dec 2018 8:10 UTC
70 points
24 comments1 min readLW link

Ba­bies and Bun­nies: A Cau­tion About Evo-Psych

Alicorn22 Feb 2010 1:53 UTC
81 points
843 comments2 min readLW link

Utili­tar­i­anism and the re­place­abil­ity of de­sires and attachments

MichaelStJules27 Jul 2024 1:57 UTC
5 points
2 comments1 min readLW link

Three AI Safety Re­lated Ideas

Wei Dai13 Dec 2018 21:32 UTC
69 points
38 comments2 min readLW link

Why we need a *the­ory* of hu­man values

Stuart_Armstrong5 Dec 2018 16:00 UTC
66 points
15 comments4 min readLW link

The ge­nie knows, but doesn’t care

Rob Bensinger6 Sep 2013 6:42 UTC
119 points
495 comments8 min readLW link

Hack­ing the CEV for Fun and Profit

Wei Dai3 Jun 2010 20:30 UTC
78 points
207 comments1 min readLW link
No comments.