RSS

Utility Functions

TagLast edit: Dec 30, 2024, 9:55 AM by Dakara

Utility Function is a function that assigns numerical values (“utilities”) to outcomes, in such a way that outcomes with higher utilities are absolutely always preferred to outcomes with lower utilities, with no exceptions; the lack of exploitable holes in the preference ordering is necessary for the definition and separates utility from mere reward.

See also: Complexity of Value, Decision Theory, Game Theory, Orthogonality Thesis, Utilitarianism, Preference, Utility, VNM Theorem

Utility Functions do not work very well in practice for individual humans. Human drives are not coherent nor is there any reason to think they would converge to a utility-function-grade level of reliability (Thou Art Godshatter), and even people with a strong interest in the concept have trouble working out what their utility function actually is even slightly (Post Your Utility Function). Furthermore, humans appear to calculate reward and loss separately—adding one to the other does not predict their behavior accurately, and thus human reward is not human utility. This makes humans highly exploitable—and in fact, not being exploitable would be a minimum requirement in order to qualify as having a coherent utility function.

pjeby posits humans’ difficulty in understanding their own utility functions as the root of akrasia.

However, utility functions can be a useful model for dealing with humans in groups, e.g. in economics.

The VNM Theorem tag is likely to be a strict subtag of the Utility Functions tag, because the VNM theorem establishes when preferences can be represented by a utility function, but a post discussing utility functions may or may not discuss the VNM theorem/​axioms.

Because utility functions arise from VNM rationality, they may still be of note in understanding intelligent systems even when the system does not explicitly store a utility function anywhere, since reducing exploitable error rate should eventually converge to utility-function-like guarantees.

Co­her­ent de­ci­sions im­ply con­sis­tent utilities

Eliezer YudkowskyMay 12, 2019, 9:33 PM
149 points
82 comments26 min readLW link3 reviews

An Ortho­dox Case Against Utility Functions

abramdemskiApr 7, 2020, 7:18 PM
154 points
66 comments8 min readLW link2 reviews

Co­her­ence ar­gu­ments do not en­tail goal-di­rected behavior

Rohin ShahDec 3, 2018, 3:26 AM
134 points
69 comments7 min readLW link3 reviews

Ap­prox­i­mately Bayesian Rea­son­ing: Knigh­tian Uncer­tainty, Good­hart, and the Look-Else­where Effect

RogerDearnaleyJan 26, 2024, 3:58 AM
16 points
2 comments11 min readLW link

Bayesian Utility: Rep­re­sent­ing Prefer­ence by Prob­a­bil­ity Measures

Vladimir_NesovJul 27, 2009, 2:28 PM
50 points
37 comments2 min readLW link

Utility ≠ Reward

Vlad MikulikSep 5, 2019, 5:28 PM
131 points
24 comments1 min readLW link2 reviews

How eas­ily can we sep­a­rate a friendly AI in de­sign space from one which would bring about a hy­per­ex­is­ten­tial catas­tro­phe?

AnirandisSep 10, 2020, 12:40 AM
20 points
19 comments2 min readLW link

Why Not Subagents?

Jun 22, 2023, 10:16 PM
130 points
52 comments14 min readLW link1 review

Time and Effort Discounting

Scott AlexanderJul 7, 2011, 11:48 PM
66 points
32 comments4 min readLW link

The Hu­man’s Hid­den Utility Func­tion (Maybe)

lukeprogJan 23, 2012, 7:39 PM
68 points
91 comments3 min readLW link

Satis­ficers want to be­come maximisers

Stuart_ArmstrongOct 21, 2011, 4:27 PM
38 points
70 comments1 min readLW link

Value/​Utility: A History

LorecNov 19, 2024, 11:01 PM
9 points
0 comments10 min readLW link

Ve­gans need to eat just enough Meat—em­per­i­cally eval­u­ate the min­i­mum am­mount of meat that max­i­mizes utility

Johannes C. MayerDec 22, 2024, 10:08 PM
55 points
35 comments3 min readLW link

Ap­ply­ing util­ity func­tions to hu­mans con­sid­ered harmful

Kaj_SotalaFeb 3, 2010, 7:22 PM
36 points
116 comments5 min readLW link

Choos­ing the Zero Point

orthonormalApr 6, 2020, 11:44 PM
175 points
24 comments3 min readLW link2 reviews

We Are Less Wrong than E. T. Jaynes on Loss Func­tions in Hu­man Society

Zack_M_DavisJun 5, 2023, 5:34 AM
46 points
14 comments2 min readLW link

Distinc­tions when Dis­cussing Utility Functions

ozziegooenMar 9, 2024, 8:14 PM
24 points
7 comments1 min readLW link

Co­her­ence ar­gu­ments im­ply a force for goal-di­rected behavior

KatjaGraceMar 26, 2021, 4:10 PM
91 points
25 comments11 min readLW link1 review
(aiimpacts.org)

Re­search Agenda v0.9: Syn­the­sis­ing a hu­man’s prefer­ences into a util­ity function

Stuart_ArmstrongJun 17, 2019, 5:46 PM
70 points
26 comments33 min readLW link

I’m no longer sure that I buy dutch book ar­gu­ments and this makes me skep­ti­cal of the “util­ity func­tion” abstraction

Eli TyreJun 22, 2021, 3:53 AM
42 points
29 comments4 min readLW link

Against util­ity functions

Qiaochu_YuanJun 19, 2014, 5:56 AM
67 points
87 comments1 min readLW link

Us­ing ex­pected util­ity for Good(hart)

Stuart_ArmstrongAug 27, 2018, 3:32 AM
42 points
5 comments4 min readLW link

[Question] How do bounded util­ity func­tions work if you are un­cer­tain how close to the bound your util­ity is?

GhatanathoahOct 6, 2021, 9:31 PM
13 points
26 comments2 min readLW link

Ngo and Yud­kowsky on AI ca­pa­bil­ity gains

Nov 18, 2021, 10:19 PM
130 points
61 comments39 min readLW link1 review

Con­se­quen­tial­ism & corrigibility

Steven ByrnesDec 14, 2021, 1:23 PM
70 points
29 comments7 min readLW link

money ≠ value

stoneflyApr 30, 2023, 5:47 PM
2 points
3 comments3 min readLW link

Re­solv­ing von Neu­mann-Mor­gen­stern In­con­sis­tent Preferences

niplavOct 22, 2024, 11:45 AM
38 points
5 comments58 min readLW link

In­fer­ring util­ity func­tions from lo­cally non-tran­si­tive preferences

JanFeb 10, 2022, 10:33 AM
32 points
15 comments8 min readLW link
(universalprior.substack.com)

Up­dat­ing Utility Functions

May 9, 2022, 9:44 AM
41 points
6 comments8 min readLW link

Differ­en­tial Op­ti­miza­tion Reframes and Gen­er­al­izes Utility-Maximization

J BostockDec 27, 2023, 1:54 AM
30 points
2 comments3 min readLW link

Think­ing about Broad Classes of Utility-like Functions

J BostockJun 7, 2022, 2:05 PM
7 points
0 comments4 min readLW link

LeCun says mak­ing a util­ity func­tion is intractable

IknownothingJun 28, 2023, 6:02 PM
2 points
3 comments1 min readLW link

De­scrip­tive vs. speci­fi­able values

TsviBTMar 26, 2023, 9:10 AM
17 points
2 comments2 min readLW link

Orthog­o­nal­ity is expensive

berenApr 3, 2023, 10:20 AM
43 points
9 comments3 min readLW link

Shard The­ory: An Overview

David UdellAug 11, 2022, 5:44 AM
166 points
34 comments10 min readLW link

The Allais Paradox

Eliezer YudkowskyJan 19, 2008, 3:05 AM
65 points
145 comments3 min readLW link

Is “VNM-agent” one of sev­eral op­tions, for what minds can grow up into?

AnnaSalamonDec 30, 2024, 6:36 AM
89 points
55 comments2 min readLW link

The VNM in­de­pen­dence ax­iom ig­nores the value of information

kilobugMar 2, 2013, 2:36 PM
15 points
48 comments1 min readLW link

The Fun­da­men­tal The­o­rem of As­set Pric­ing: Miss­ing Link of the Dutch Book Arguments

johnswentworthJun 1, 2019, 8:34 PM
42 points
5 comments3 min readLW link

Stable Poin­t­ers to Value III: Re­cur­sive Quantilization

abramdemskiJul 21, 2018, 8:06 AM
20 points
4 comments4 min readLW link

[Question] Why The Fo­cus on Ex­pected Utility Max­imisers?

DragonGodDec 27, 2022, 3:49 PM
118 points
84 comments3 min readLW link

Com­par­ing Utilities

abramdemskiSep 14, 2020, 8:56 PM
72 points
31 comments17 min readLW link

Pin­point­ing Utility

[deleted]Feb 1, 2013, 3:58 AM
94 points
156 comments13 min readLW link

If you don’t know the name of the game, just tell me what I mean to you

Stuart_ArmstrongOct 26, 2010, 1:43 PM
16 points
26 comments5 min readLW link

Com­pu­ta­tional effi­ciency rea­sons not to model VNM-ra­tio­nal prefer­ence re­la­tions with util­ity functions

AlexMennenJul 25, 2018, 2:11 AM
16 points
5 comments3 min readLW link

An At­tempt at Prefer­ence Uncer­tainty Us­ing VNM

[deleted]Jul 16, 2013, 5:20 AM
15 points
33 comments6 min readLW link

Why Subagents?

johnswentworthAug 1, 2019, 10:17 PM
175 points
48 comments7 min readLW link1 review

[Question] Why doesn’t the pres­ence of log-loss for prob­a­bil­is­tic mod­els (e.g. se­quence pre­dic­tion) im­ply that any util­ity func­tion ca­pa­ble of pro­duc­ing a “fairly ca­pa­ble” agent will have at least some non-neg­ligible frac­tion of over­lap with hu­man val­ues?

Thoth HermesMay 16, 2023, 6:02 PM
2 points
0 comments1 min readLW link

In­terthe­o­retic util­ity comparison

Stuart_ArmstrongJul 3, 2018, 1:44 PM
23 points
11 comments6 min readLW link

Deon­tol­ogy for Consequentialists

AlicornJan 30, 2010, 5:58 PM
61 points
255 comments6 min readLW link

Game The­ory with­out Argmax [Part 1]

Cleo NardoNov 11, 2023, 3:59 PM
70 points
18 comments19 min readLW link

The Iso­la­tion As­sump­tion of Ex­pected Utility Maximization

Pedro OliboniAug 6, 2020, 4:05 AM
7 points
1 comment5 min readLW link

Game The­ory with­out Argmax [Part 2]

Cleo NardoNov 11, 2023, 4:02 PM
31 points
14 comments13 min readLW link

When do util­ity func­tions con­strain?

HoagyAug 23, 2019, 5:19 PM
30 points
8 comments7 min readLW link

[link] Choose your (prefer­ence) util­i­tar­i­anism care­fully – part 1

Kaj_SotalaJun 25, 2015, 12:06 PM
21 points
6 comments2 min readLW link

Per­son-mo­ment af­fect­ing views

KatjaGraceMar 7, 2018, 2:30 AM
17 points
8 comments5 min readLW link
(meteuphoric.wordpress.com)

To cap­ture anti-death in­tu­itions, in­clude mem­ory in utilitarianism

Kaj_SotalaJan 15, 2014, 6:27 AM
12 points
34 comments3 min readLW link

Valence Need Not Be Bounded; Utility Need Not Synthesize

LorecNov 20, 2024, 1:37 AM
8 points
0 comments6 min readLW link

Zut Allais!

Eliezer YudkowskyJan 20, 2008, 3:18 AM
59 points
51 comments6 min readLW link

Against Dis­count Rates

Eliezer YudkowskyJan 21, 2008, 10:00 AM
38 points
81 comments2 min readLW link

Why you must max­i­mize ex­pected utility

BenyaDec 13, 2012, 1:11 AM
50 points
76 comments21 min readLW link

Only hu­mans can have hu­man values

PhilGoetzApr 26, 2010, 6:57 PM
48 points
161 comments17 min readLW link

Post Your Utility Function

tawJun 4, 2009, 5:05 AM
39 points
280 comments1 min readLW link

What re­sources have in­creas­ing marginal util­ity?

Qiaochu_YuanJun 14, 2014, 3:43 AM
59 points
63 comments1 min readLW link

Na­ture < Nur­ture for AIs

scottviteriJun 4, 2023, 8:38 PM
14 points
22 comments7 min readLW link

Etho­dy­nam­ics of Omelas

dr_sJun 10, 2023, 4:24 PM
83 points
18 comments9 min readLW link1 review

The Do­main of Your Utility Function

Peter_de_BlancJun 23, 2009, 4:58 AM
42 points
99 comments2 min readLW link

Sim­plified prefer­ences needed; sim­plified prefer­ences sufficient

Stuart_ArmstrongMar 5, 2019, 7:39 PM
33 points
6 comments3 min readLW link

Utility ver­sus Re­ward func­tion: par­tial equivalence

Stuart_ArmstrongApr 13, 2018, 2:58 PM
18 points
5 comments5 min readLW link

The Prefer­ence Utili­tar­ian’s Time In­con­sis­tency Problem

Wei DaiJan 15, 2010, 12:26 AM
35 points
107 comments1 min readLW link

Com­plex Be­hav­ior from Sim­ple (Sub)Agents

moridinamaelMay 10, 2019, 9:44 PM
113 points
14 comments9 min readLW link1 review

“Solv­ing” self­ish­ness for UDT

Stuart_ArmstrongOct 27, 2014, 5:51 PM
39 points
52 comments8 min readLW link

Why Univer­sal Com­pa­ra­bil­ity of Utility?

AKMay 13, 2018, 12:10 AM
8 points
16 comments1 min readLW link

What we talk about when we talk about max­imis­ing utility

Richard_NgoFeb 24, 2018, 10:33 PM
14 points
18 comments4 min readLW link

VNM ex­pected util­ity the­ory: uses, abuses, and interpretation

AcademianApr 17, 2010, 8:23 PM
36 points
51 comments10 min readLW link

Risk aver­sion vs. con­cave util­ity function

dvasyaJan 31, 2012, 6:25 AM
3 points
35 comments3 min readLW link

Is the En­dow­ment Effect Due to In­com­pa­ra­bil­ity?

Kevin DorstJul 10, 2023, 4:26 PM
21 points
10 comments7 min readLW link
(kevindorst.substack.com)

Univer­sal agents and util­ity functions

AnjaNov 14, 2012, 4:05 AM
43 points
38 comments6 min readLW link

Harsanyi’s So­cial Ag­gre­ga­tion The­o­rem and what it means for CEV

AlexMennenJan 5, 2013, 9:38 PM
37 points
90 comments4 min readLW link

Align­ment, con­flict, powerseeking

Oliver SourbutNov 22, 2023, 9:47 AM
6 points
1 comment1 min readLW link

Are pre-speci­fied util­ity func­tions about the real world pos­si­ble in prin­ci­ple?

mloganJul 11, 2018, 6:46 PM
24 points
7 comments4 min readLW link

ACI#4: Seed AI is the new Per­pet­ual Mo­tion Machine

Akira PyinyaJul 8, 2023, 1:17 AM
−1 points
0 comments6 min readLW link

Sublimity vs. Youtube

AlicornMar 18, 2011, 5:33 AM
33 points
63 comments1 min readLW link

Op­ti­mi­sa­tion Mea­sures: Desider­ata, Im­pos­si­bil­ity, Proposals

Aug 7, 2023, 3:52 PM
36 points
9 comments1 min readLW link

Why the be­liefs/​val­ues di­chotomy?

Wei DaiOct 20, 2009, 4:35 PM
29 points
156 comments2 min readLW link

Knigh­tian Uncer­tainty and Am­bi­guity Aver­sion: Motivation

So8resJul 21, 2014, 8:32 PM
48 points
44 comments13 min readLW link

Con­cep­tual prob­lems with util­ity functions

DacynJul 11, 2018, 1:29 AM
22 points
12 comments2 min readLW link

Allais Malaise

Eliezer YudkowskyJan 21, 2008, 12:40 AM
41 points
38 comments2 min readLW link

I’m con­fused. Could some­one help?

CronoDASMar 23, 2009, 5:26 AM
1 point
12 comments1 min readLW link

On dol­lars, util­ity, and crack cocaine

PhilGoetzApr 4, 2009, 12:00 AM
16 points
100 comments2 min readLW link

Real-world ex­am­ples of money-pump­ing?

sixes_and_sevensApr 25, 2013, 1:49 PM
28 points
97 comments1 min readLW link

How Not to be Stupid: Adorable Maybes

Psy-KoshApr 29, 2009, 7:15 PM
1 point
55 comments3 min readLW link

Ex­pected util­ity with­out the in­de­pen­dence axiom

Stuart_ArmstrongOct 28, 2009, 2:40 PM
20 points
68 comments4 min readLW link

Allais Hack—Trans­form Your De­ci­sions!

MBlumeMay 3, 2009, 10:37 PM
22 points
19 comments2 min readLW link

How Not to be Stupid: Brew­ing a Nice Cup of Utilitea

Psy-KoshMay 9, 2009, 8:14 AM
2 points
17 comments6 min readLW link

Want­ing to Want

AlicornMay 16, 2009, 3:08 AM
30 points
199 comments2 min readLW link

Ar­gu­ments for util­i­tar­i­anism are im­pos­si­bil­ity ar­gu­ments un­der un­bounded prospects

MichaelStJulesOct 7, 2023, 9:08 PM
7 points
7 comments21 min readLW link

Ex­pected fu­til­ity for humans

RokoJun 9, 2009, 12:04 PM
14 points
53 comments3 min readLW link

If it looks like util­ity max­i­mizer and quacks like util­ity max­i­mizer...

tawJun 11, 2009, 6:34 PM
20 points
24 comments2 min readLW link

Utility Max­i­miza­tion = De­scrip­tion Length Minimization

johnswentworthFeb 18, 2021, 6:04 PM
214 points
44 comments5 min readLW link

A fun­gi­bil­ity theorem

NisanJan 12, 2013, 9:27 AM
35 points
66 comments6 min readLW link

Chas­ing Infinities

Michael BatemanAug 16, 2021, 1:19 AM
2 points
1 comment9 min readLW link

The Dou­bling Box

MestroyerAug 6, 2012, 5:50 AM
22 points
84 comments3 min readLW link

Hous­ing Mar­kets, Satis­ficers, and One-Track Goodhart

J BostockDec 16, 2021, 9:38 PM
2 points
2 comments2 min readLW link

[Question] Your Preferences

PeterLJan 5, 2022, 6:49 PM
1 point
4 comments1 min readLW link

Im­pos­si­bil­ity re­sults for un­bounded utilities

paulfchristianoFeb 2, 2022, 3:52 AM
167 points
109 comments8 min readLW link1 review

The Unified The­ory of Nor­ma­tive Ethics

Thane RuthenisJun 17, 2022, 7:55 PM
8 points
0 comments6 min readLW link

The “Mea­sur­ing Stick of Utility” Problem

johnswentworthMay 25, 2022, 4:17 PM
74 points
25 comments3 min readLW link

Adap­ta­tion Ex­ecu­tors and the Telos Margin

PlinthistJun 20, 2022, 1:06 PM
2 points
8 comments5 min readLW link

Re­in­force­ment Learner Wireheading

Nate ShowellJul 8, 2022, 5:32 AM
8 points
2 comments3 min readLW link

Utility func­tions and prob­a­bil­ities are entangled

Thomas KwaJul 26, 2022, 5:36 AM
15 points
5 comments1 min readLW link

A gen­tle primer on car­ing, in­clud­ing in strange senses, with applications

KaarelAug 30, 2022, 8:05 AM
10 points
4 comments18 min readLW link

Bridg­ing Ex­pected Utility Max­i­miza­tion and Optimization

Daniel HerrmannAug 5, 2022, 8:18 AM
25 points
5 comments14 min readLW link

An Un­ex­pected GPT-3 De­ci­sion in a Sim­ple Gam­ble

casualphysicsenjoyerSep 25, 2022, 4:46 PM
8 points
4 comments1 min readLW link

Will Values and Com­pe­ti­tion De­cou­ple?

intersticeSep 28, 2022, 4:27 PM
15 points
11 comments17 min readLW link

Why Bet Kelly?

Joe ZimmermanNov 29, 2022, 6:47 PM
16 points
4 comments4 min readLW link

Take 7: You should talk about “the hu­man’s util­ity func­tion” less.

Charlie SteinerDec 8, 2022, 8:14 AM
50 points
22 comments2 min readLW link

Thatcher’s Axiom

Edward P. KöningsJan 24, 2023, 10:35 PM
10 points
22 comments4 min readLW link

The Lin­guis­tic Blind Spot of Value-Aligned Agency, Nat­u­ral and Ar­tifi­cial

Roman LeventovFeb 14, 2023, 6:57 AM
6 points
0 comments2 min readLW link
(arxiv.org)

[Question] Math­e­mat­i­cal mod­els of Ethics

VictorsMar 8, 2023, 5:40 PM
4 points
2 comments1 min readLW link

AI Align­ment 2018-19 Review

Rohin ShahJan 28, 2020, 2:19 AM
126 points
6 comments35 min readLW link

Ver­ify­ing vNM-ra­tio­nal­ity re­quires an ontology

jeyoorMar 13, 2019, 12:03 AM
25 points
5 comments1 min readLW link

[Question] Why does ex­pected util­ity mat­ter?

Marco DiscendentiDec 25, 2023, 2:47 PM
18 points
21 comments4 min readLW link

Utility is relative

CrimsonChinJan 8, 2024, 2:31 AM
2 points
4 comments2 min readLW link

A Ped­a­gog­i­cal Guide to Corrigibility

A.H.Jan 17, 2024, 11:45 AM
6 points
3 comments16 min readLW link

In­creas­ingly vague in­ter­per­sonal welfare comparisons

MichaelStJulesFeb 1, 2024, 6:45 AM
5 points
0 comments1 min readLW link

Types of sub­jec­tive welfare

MichaelStJulesFeb 2, 2024, 9:56 AM
10 points
3 comments1 min readLW link

At­las: Stress-Test­ing ASI Value Learn­ing Through Grand Strat­egy Scenarios

NeilFoxFeb 17, 2025, 11:55 PM
1 point
0 comments2 min readLW link

Solu­tion to the two en­velopes prob­lem for moral weights

MichaelStJulesFeb 19, 2024, 12:15 AM
9 points
1 comment1 min readLW link

In­di­vi­d­ual Utilities Shift Con­tin­u­ously as Geo­met­ric Weights Shift

StrivingForLegibilityAug 7, 2024, 1:41 AM
2 points
0 comments17 min readLW link

Gra­di­ent As­cen­ders Reach the Harsanyi Hyperplane

StrivingForLegibilityAug 7, 2024, 1:40 AM
4 points
0 comments6 min readLW link

Deriv­ing the Geo­met­ric Utili­tar­ian Weights

StrivingForLegibilityAug 7, 2024, 1:39 AM
2 points
0 comments11 min readLW link

Prov­ing the Geo­met­ric Utili­tar­ian Theorem

StrivingForLegibilityAug 7, 2024, 1:39 AM
25 points
0 comments8 min readLW link

Geo­met­ric Utili­tar­i­anism (And Why It Mat­ters)

StrivingForLegibilityMay 12, 2024, 3:41 AM
34 points
2 comments11 min readLW link

The Geo­met­ric Im­por­tance of Side Payments

StrivingForLegibilityAug 7, 2024, 1:38 AM
8 points
4 comments3 min readLW link

Gra­da­tions of moral weight

MichaelStJulesFeb 29, 2024, 11:08 PM
1 point
0 comments1 min readLW link

The Im­pos­si­bil­ity of a Ra­tional In­tel­li­gence Optimizer

Nicolas VillarrealJun 6, 2024, 4:14 PM
−9 points
5 comments14 min readLW link

[Aspira­tion-based de­signs] A. Da­m­ages from mis­al­igned op­ti­miza­tion – two more models

Jul 15, 2024, 2:08 PM
6 points
0 comments9 min readLW link

[Question] Toward a Math­e­mat­i­cal Defi­ni­tion of Ra­tion­al­ity in Multi-Agent Systems

nekofuguFeb 23, 2025, 5:29 PM
1 point
0 comments1 min readLW link

Bet­ter differ­ence-mak­ing views

MichaelStJulesDec 21, 2024, 6:27 PM
7 points
0 comments1 min readLW link

Se­quence overview: Welfare and moral weights

MichaelStJulesAug 15, 2024, 4:22 AM
7 points
0 comments1 min readLW link

[Question] Do­ing Noth­ing Utility Function

k64Sep 26, 2024, 10:05 PM
9 points
9 comments1 min readLW link

Galatea and the windup toy

Nicolas VillarrealOct 26, 2024, 2:52 PM
−3 points
0 comments13 min readLW link
(nicolasdvillarreal.substack.com)

Ex­pected Utility, Geo­met­ric Utility, and Other Equiv­a­lent Representations

StrivingForLegibilityNov 20, 2024, 11:28 PM
10 points
0 comments11 min readLW link

Build­ing AI safety bench­mark en­vi­ron­ments on themes of uni­ver­sal hu­man values

Roland PihlakasJan 3, 2025, 4:24 AM
18 points
3 comments8 min readLW link
(docs.google.com)

Hu­mans are util­ity monsters

PhilGoetzAug 16, 2013, 9:05 PM
123 points
216 comments2 min readLW link

Why mod­el­ling multi-ob­jec­tive home­osta­sis is es­sen­tial for AI al­ign­ment (and how it helps with AI safety as well)

Roland PihlakasJan 12, 2025, 3:37 AM
46 points
7 comments10 min readLW link

Notable run­away-op­ti­miser-like LLM failure modes on Biolog­i­cally and Eco­nom­i­cally al­igned AI safety bench­marks for LLMs with sim­plified ob­ser­va­tion format

Mar 16, 2025, 11:23 PM
36 points
6 comments7 min readLW link

Utility Eng­ineer­ing: An­a­lyz­ing and Con­trol­ling Emer­gent Value Sys­tems in AIs

Matrice JacobineFeb 12, 2025, 9:15 AM
51 points
49 comments1 min readLW link
(www.emergent-values.ai)

Free­dom Is All We Need

Leo GlisicApr 27, 2023, 12:09 AM
−1 points
8 comments10 min readLW link

(A Failed Ap­proach) From Prece­dent to Utility Function

Akira PyinyaApr 29, 2023, 9:55 PM
0 points
2 comments4 min readLW link

Agents which are EU-max­i­miz­ing as a group are not EU-max­i­miz­ing individually

MlxaDec 4, 2023, 6:49 PM
3 points
2 comments2 min readLW link

[Question] “Do Noth­ing” util­ity func­tion, 3½ years later?

niplavJul 20, 2020, 11:09 AM
5 points
3 comments1 min readLW link

De­grees of Freedom

sarahconstantinApr 2, 2019, 9:10 PM
103 points
31 comments11 min readLW link
(srconstantin.wordpress.com)

Against the Lin­ear Utility Hy­poth­e­sis and the Lev­er­age Penalty

AlexMennenDec 14, 2017, 6:38 PM
41 points
47 comments11 min readLW link

Ter­mi­nal Values and In­stru­men­tal Values

Eliezer YudkowskyNov 15, 2007, 7:56 AM
116 points
46 comments10 min readLW link

Three ways that “Suffi­ciently op­ti­mized agents ap­pear co­her­ent” can be false

Wei DaiMar 5, 2019, 9:52 PM
65 points
3 comments3 min readLW link

The ge­nie knows, but doesn’t care

Rob BensingerSep 6, 2013, 6:42 AM
121 points
495 comments8 min readLW link

Buri­dan’s ass in co­or­di­na­tion games

jessicataJul 16, 2018, 2:51 AM
52 points
26 comments10 min readLW link

Pas­cal’s Mug­ging: Tiny Prob­a­bil­ities of Vast Utilities

Eliezer YudkowskyOct 19, 2007, 11:37 PM
112 points
354 comments4 min readLW link

Pas­cal’s Mug­gle: In­finites­i­mal Pri­ors and Strong Evidence

Eliezer YudkowskyMay 8, 2013, 12:43 AM
73 points
402 comments26 min readLW link

We Don’t Have a Utility Function

[deleted]Apr 2, 2013, 3:49 AM
73 points
118 comments4 min readLW link

Prob­a­bil­ity is Real, and Value is Complex

abramdemskiJul 20, 2018, 5:24 AM
80 points
21 comments6 min readLW link

The Lifes­pan Dilemma

Eliezer YudkowskySep 10, 2009, 6:45 PM
61 points
220 comments7 min readLW link

When to use quantilization

RyanCareyFeb 5, 2019, 5:17 PM
65 points
5 comments4 min readLW link

Big Ad­vance in In­finite Ethics

bwestNov 28, 2017, 3:10 PM
32 points
13 comments5 min readLW link

Fake Utility Functions

Eliezer YudkowskyDec 6, 2007, 4:55 PM
71 points
63 comments4 min readLW link

More on the Lin­ear Utility Hy­poth­e­sis and the Lev­er­age Prior

AlexMennenFeb 26, 2018, 11:53 PM
16 points
4 comments9 min readLW link

Ex­pected util­ity, un­los­ing agents, and Pas­cal’s mugging

Stuart_ArmstrongJul 28, 2014, 6:05 PM
32 points
54 comments5 min readLW link

ACI #3: The Ori­gin of Goals and Utility

Akira PyinyaMay 17, 2023, 8:47 PM
1 point
0 comments6 min readLW link

Is risk aver­sion re­ally ir­ra­tional ?

kilobugJan 31, 2012, 8:34 PM
54 points
65 comments9 min readLW link

Co­her­ent be­havi­our in the real world is an in­co­her­ent concept

Richard_NgoFeb 11, 2019, 5:00 PM
51 points
17 comments9 min readLW link

A sum­mary of Sav­age’s foun­da­tions for prob­a­bil­ity and util­ity.

SniffnoyMay 22, 2011, 7:56 PM
84 points
92 comments13 min readLW link

Log­a­r­ithms and To­tal Utilitarianism

Pablo VillalobosAug 9, 2018, 8:49 AM
37 points
31 comments4 min readLW link

Ten­den­cies in re­flec­tive equilibrium

Scott AlexanderJul 20, 2011, 10:38 AM
51 points
71 comments4 min readLW link

Un­der­ap­pre­ci­ated points about util­ity func­tions (of both sorts)

SniffnoyJan 4, 2020, 7:27 AM
47 points
61 comments15 min readLW link
No comments.