RSS

Good­hart’s Law

TagLast edit: Mar 19, 2023, 9:29 PM by Diabloto96

Goodhart’s Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good.

Goodhart’s Law is of particular relevance to AI Alignment. Suppose you have something which is generally a good proxy for “the stuff that humans care about”, it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart’s law, the proxy will breakdown.

Goodhart Taxonomy

In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting:

See Also

Good­hart Taxonomy

Scott GarrabrantDec 30, 2017, 4:38 PM
214 points
34 comments10 min readLW link

Clas­sify­ing speci­fi­ca­tion prob­lems as var­i­ants of Good­hart’s Law

VikaAug 19, 2019, 8:40 PM
72 points
5 comments5 min readLW link1 review

Speci­fi­ca­tion gam­ing ex­am­ples in AI

VikaApr 3, 2018, 12:30 PM
47 points
9 comments1 min readLW link2 reviews

Every­thing I ever needed to know, I learned from World of War­craft: Good­hart’s law

Said AchmizMay 3, 2018, 4:33 PM
37 points
21 comments6 min readLW link1 review
(blog.obormot.net)

Re­plac­ing Karma with Good Heart To­kens (Worth $1!)

Apr 1, 2022, 9:31 AM
225 points
173 comments4 min readLW link

Sig­nal­ing isn’t about sig­nal­ing, it’s about Goodhart

ValentineJan 6, 2022, 6:49 PM
59 points
31 comments9 min readLW link

Good­hart’s Law Causal Diagrams

Apr 11, 2022, 1:52 PM
34 points
6 comments6 min readLW link

The Nat­u­ral State is Goodhart

devanshMar 20, 2023, 12:00 AM
59 points
4 comments2 min readLW link

How much do you be­lieve your re­sults?

Eric NeymanMay 6, 2023, 8:31 PM
505 points
18 comments15 min readLW link4 reviews
(ericneyman.wordpress.com)

When is Good­hart catas­trophic?

May 9, 2023, 3:59 AM
180 points
29 comments8 min readLW link1 review

Good­hart’s Curse and Limi­ta­tions on AI Alignment

Gordon Seidoh WorleyAug 19, 2019, 7:57 AM
25 points
18 comments10 min readLW link

The Im­por­tance of Good­hart’s Law

blogospheroidMar 13, 2010, 8:19 AM
117 points
123 comments3 min readLW link

In­tro­duc­tion to Re­duc­ing Goodhart

Charlie SteinerAug 26, 2021, 6:38 PM
48 points
10 comments4 min readLW link

[Question] How does Gra­di­ent Des­cent In­ter­act with Good­hart?

Scott GarrabrantFeb 2, 2019, 12:14 AM
68 points
19 comments4 min readLW link

Good­hart Tax­on­omy: Agreement

Ben PaceJul 1, 2018, 3:50 AM
44 points
4 comments7 min readLW link

Some im­pli­ca­tions of rad­i­cal empathy

MichaelStJulesJan 7, 2025, 4:10 PM
3 points
0 comments1 min readLW link

Really rad­i­cal empathy

MichaelStJulesJan 6, 2025, 5:46 PM
19 points
0 comments1 min readLW link

Utili­tar­i­anism and the re­place­abil­ity of de­sires and attachments

MichaelStJulesJul 27, 2024, 1:57 AM
5 points
2 comments1 min readLW link

Ac­tu­al­ism, asym­me­try and extinction

MichaelStJulesJan 7, 2025, 4:02 PM
1 point
4 comments1 min readLW link

Ap­prox­i­mately Bayesian Rea­son­ing: Knigh­tian Uncer­tainty, Good­hart, and the Look-Else­where Effect

RogerDearnaleyJan 26, 2024, 3:58 AM
16 points
2 comments11 min readLW link

Does Bayes Beat Good­hart?

abramdemskiJun 3, 2019, 2:31 AM
48 points
26 comments7 min readLW link

New Paper Ex­pand­ing on the Good­hart Taxonomy

Scott GarrabrantMar 14, 2018, 9:01 AM
17 points
4 comments1 min readLW link
(arxiv.org)

Defeat­ing Good­hart and the “clos­est un­blocked strat­egy” problem

Stuart_ArmstrongApr 3, 2019, 2:46 PM
45 points
15 comments6 min readLW link

Is Google Paper­clip­ping the Web? The Per­ils of Op­ti­miza­tion by Proxy in So­cial Systems

AlexandrosMay 10, 2010, 1:25 PM
56 points
105 comments10 min readLW link

Catas­trophic Re­gres­sional Good­hart: Appendix

May 15, 2023, 12:10 AM
25 points
1 comment9 min readLW link

Us­ing ex­pected util­ity for Good(hart)

Stuart_ArmstrongAug 27, 2018, 3:32 AM
42 points
5 comments4 min readLW link

Re­quire­ments for a STEM-ca­pa­ble AGI Value Learner (my Case for Less Doom)

RogerDearnaleyMay 25, 2023, 9:26 AM
33 points
3 comments15 min readLW link

Is Click­bait De­stroy­ing Our Gen­eral In­tel­li­gence?

Eliezer YudkowskyNov 16, 2018, 11:06 PM
191 points
65 comments5 min readLW link2 reviews

Don’t de­sign agents which ex­ploit ad­ver­sar­ial inputs

Nov 18, 2022, 1:48 AM
72 points
64 comments12 min readLW link

Embed­ded Agency (full-text ver­sion)

Nov 15, 2018, 7:49 PM
201 points
17 comments54 min readLW link

Prin­ci­pled Satis­fic­ing To Avoid Goodhart

JenniferRMAug 16, 2024, 7:05 PM
45 points
2 comments8 min readLW link

Ro­bust Delegation

Nov 4, 2018, 4:38 PM
116 points
10 comments1 min readLW link

Op­ti­miza­tion Amplifies

Scott GarrabrantJun 27, 2018, 1:51 AM
114 points
12 comments4 min readLW link

Speci­fi­ca­tion gam­ing: the flip side of AI ingenuity

May 6, 2020, 11:51 PM
66 points
9 comments6 min readLW link

Good­hart’s Law Ex­am­ple: Train­ing Ver­ifiers to Solve Math Word Problems

Chris_LeongNov 25, 2023, 12:53 AM
27 points
2 comments1 min readLW link
(arxiv.org)

Notic­ing the Taste of Lotus

ValentineApr 27, 2018, 8:05 PM
218 points
81 comments3 min readLW link3 reviews

Guard­ing Slack vs Substance

RaemonDec 13, 2017, 8:58 PM
42 points
6 comments6 min readLW link

Hu­mans are not au­to­mat­i­cally strategic

AnnaSalamonSep 8, 2010, 7:02 AM
581 points
278 comments4 min readLW link

(Some?) Pos­si­ble Multi-Agent Good­hart Interactions

DavidmanheimSep 22, 2018, 5:48 PM
20 points
2 comments5 min readLW link

Re-in­tro­duc­ing Selec­tion vs Con­trol for Op­ti­miza­tion (Op­ti­miz­ing and Good­hart Effects—Clar­ify­ing Thoughts, Part 1)

DavidmanheimJul 2, 2019, 3:36 PM
31 points
5 comments4 min readLW link

Fun­da­men­tal Uncer­tainty: Chap­ter 8 - When does fun­da­men­tal un­cer­tainty mat­ter?

Gordon Seidoh WorleyApr 26, 2024, 6:10 PM
11 points
2 comments32 min readLW link

Catas­trophic Good­hart in RL with KL penalty

May 15, 2024, 12:58 AM
62 points
10 comments7 min readLW link

The Good­hart Game

John_MaxwellNov 18, 2019, 11:22 PM
13 points
5 comments5 min readLW link

Hon­est sci­ence is spirituality

pchvykovJul 1, 2024, 8:33 PM
−1 points
10 comments4 min readLW link

If I were a well-in­ten­tioned AI… III: Ex­tremal Goodhart

Stuart_ArmstrongFeb 28, 2020, 11:24 AM
22 points
0 comments5 min readLW link

The Dumb­ifi­ca­tion of our smart screens

Itay DreyfusJul 4, 2024, 6:32 AM
18 points
0 comments5 min readLW link
(productidentity.co)

All I know is Goodhart

Stuart_ArmstrongOct 21, 2019, 12:12 PM
28 points
23 comments3 min readLW link

What does Op­ti­miza­tion Mean, Again? (Op­ti­miz­ing and Good­hart Effects—Clar­ify­ing Thoughts, Part 2)

DavidmanheimJul 28, 2019, 9:30 AM
26 points
7 comments4 min readLW link

Op­ti­mized for Some­thing other than Win­ning or: How Cricket Re­sists Moloch and Good­hart’s Law

A.H.Jul 5, 2023, 12:33 PM
53 points
26 comments4 min readLW link

nos­talge­braist: Re­cur­sive Good­hart’s Law

Kaj_SotalaAug 26, 2020, 11:07 AM
53 points
27 comments1 min readLW link
(nostalgebraist.tumblr.com)

Mar­kets are Anti-Inductive

Eliezer YudkowskyFeb 26, 2009, 12:55 AM
90 points
62 comments4 min readLW link

Con­struct­ing Goodhart

johnswentworthFeb 3, 2019, 9:59 PM
29 points
10 comments3 min readLW link

Bound­ing Good­hart’s Law

eric_langloisJul 11, 2018, 12:46 AM
43 points
2 comments5 min readLW link

Satis­ficers want to be­come maximisers

Stuart_ArmstrongOct 21, 2011, 4:27 PM
38 points
70 comments1 min readLW link

The Three Levels of Good­hart’s Curse

Scott GarrabrantDec 30, 2017, 4:41 PM
7 points
2 comments3 min readLW link

How my school gamed the stats

Srdjan MileticFeb 20, 2021, 7:23 PM
83 points
26 comments4 min readLW link

Boot­strapped Alignment

Gordon Seidoh WorleyFeb 27, 2021, 3:46 PM
20 points
12 comments2 min readLW link

Com­pe­tent Preferences

Charlie SteinerSep 2, 2021, 2:26 PM
30 points
2 comments6 min readLW link

Good­hart Ethology

Charlie SteinerSep 17, 2021, 5:31 PM
20 points
4 comments14 min readLW link

Models Model­ing Models

Charlie SteinerNov 2, 2021, 7:08 AM
23 points
5 comments10 min readLW link

My Overview of the AI Align­ment Land­scape: Threat Models

Neel NandaDec 25, 2021, 11:07 PM
53 points
3 comments28 min readLW link

[In­tro to brain-like-AGI safety] 10. The al­ign­ment problem

Steven ByrnesMar 30, 2022, 1:24 PM
48 points
7 comments21 min readLW link

Proxy mis­speci­fi­ca­tion and the ca­pa­bil­ities vs. value learn­ing race

Sam MarksMay 16, 2022, 6:58 PM
23 points
3 comments4 min readLW link

De­tect Good­hart and shut down

Jeremy GillenJan 22, 2025, 6:45 PM
68 points
21 comments7 min readLW link

Re­duc­ing Good­hart: An­nounce­ment, Ex­ec­u­tive Summary

Charlie SteinerAug 20, 2022, 9:49 AM
16 points
0 comments1 min readLW link

Good­hart Ty­pol­ogy via Struc­ture, Func­tion, and Ran­dom­ness Distributions

Mar 25, 2025, 4:01 PM
32 points
0 comments15 min readLW link

Re­ward hack­ing and Good­hart’s law by evolu­tion­ary algorithms

Jan_KulveitMar 30, 2018, 7:57 AM
18 points
5 comments1 min readLW link
(arxiv.org)

Non-Ad­ver­sar­ial Good­hart and AI Risks

DavidmanheimMar 27, 2018, 1:39 AM
22 points
11 comments6 min readLW link

Align­ment al­lows “non­ro­bust” de­ci­sion-in­fluences and doesn’t re­quire ro­bust grading

TurnTroutNov 29, 2022, 6:23 AM
62 points
41 comments15 min readLW link

Don’t al­ign agents to eval­u­a­tions of plans

TurnTroutNov 26, 2022, 9:16 PM
48 points
49 comments18 min readLW link

Soft op­ti­miza­tion makes the value tar­get bigger

Jeremy GillenJan 2, 2023, 4:06 PM
119 points
20 comments12 min readLW link

[Question] Do the Safety Prop­er­ties of Pow­er­ful AI Sys­tems Need to be Ad­ver­sar­i­ally Ro­bust? Why?

DragonGodFeb 9, 2023, 1:36 PM
22 points
42 comments2 min readLW link

Val­ida­tor mod­els: A sim­ple ap­proach to de­tect­ing goodharting

berenFeb 20, 2023, 9:32 PM
14 points
1 comment4 min readLW link

Weak vs Quan­ti­ta­tive Ex­tinc­tion-level Good­hart’s Law

Feb 21, 2024, 5:38 PM
27 points
1 comment2 min readLW link

When Can Op­ti­miza­tion Be Done Safely?

StrivingForLegibilityDec 30, 2023, 1:24 AM
12 points
0 comments3 min readLW link

Aldix and the Book of Life

villeJan 1, 2024, 5:23 PM
1 point
0 comments4 min readLW link
(medium.com)

At­las: Stress-Test­ing ASI Value Learn­ing Through Grand Strat­egy Scenarios

NeilFoxFeb 17, 2025, 11:55 PM
1 point
0 comments2 min readLW link

Ex­tinc­tion Risks from AI: In­visi­ble to Science?

Feb 21, 2024, 6:07 PM
24 points
7 comments1 min readLW link
(arxiv.org)

Dy­nam­ics Cru­cial to AI Risk Seem to Make for Com­pli­cated Models

Feb 21, 2024, 5:54 PM
19 points
0 comments9 min readLW link

Ex­tinc­tion-level Good­hart’s Law as a Prop­erty of the Environment

Feb 21, 2024, 5:56 PM
23 points
0 comments10 min readLW link

Could Things Be Very Differ­ent?—How His­tor­i­cal In­er­tia Might Blind Us To Op­ti­mal Solutions

James Stephen BrownSep 11, 2024, 9:53 AM
5 points
0 comments8 min readLW link
(nonzerosum.games)

Good­hart’s Law and Emotions

Zero ContradictionsJul 7, 2024, 8:32 AM
1 point
5 comments1 min readLW link
(expandingrationality.substack.com)

[Aspira­tion-based de­signs] A. Da­m­ages from mis­al­igned op­ti­miza­tion – two more models

Jul 15, 2024, 2:08 PM
6 points
0 comments9 min readLW link

Don’t want Good­hart? — Spec­ify the damn variables

Yan LyutnevNov 21, 2024, 10:45 PM
−3 points
2 comments5 min readLW link

Don’t want Good­hart? — Spec­ify the vari­ables more

YanLyutnevNov 21, 2024, 10:43 PM
3 points
2 comments5 min readLW link

Other Papers About the The­ory of Re­ward Learning

Joar SkalseFeb 28, 2025, 7:26 PM
16 points
0 comments5 min readLW link

Defin­ing and Char­ac­ter­is­ing Re­ward Hacking

Joar SkalseFeb 28, 2025, 7:25 PM
15 points
0 comments4 min readLW link

Vi­sual demon­stra­tion of Op­ti­mizer’s curse

Roman MalovNov 30, 2024, 7:34 PM
25 points
3 comments7 min readLW link

Why mod­el­ling multi-ob­jec­tive home­osta­sis is es­sen­tial for AI al­ign­ment (and how it helps with AI safety as well)

Roland PihlakasJan 12, 2025, 3:37 AM
46 points
7 comments10 min readLW link

Think­ing about max­i­miza­tion and corrigibility

James PayorApr 21, 2023, 9:22 PM
63 points
4 comments5 min readLW link

Mo­ral Mazes and Short Termism

ZviJun 2, 2019, 11:30 AM
74 points
21 comments4 min readLW link
(thezvi.wordpress.com)

The new dot com bub­ble is here: it’s called on­line advertising

Gordon Seidoh WorleyNov 18, 2019, 10:05 PM
50 points
17 comments2 min readLW link
(thecorrespondent.com)

How Doomed are Large Or­ga­ni­za­tions?

ZviJan 21, 2020, 12:20 PM
81 points
42 comments9 min readLW link
(thezvi.wordpress.com)

When to use quantilization

RyanCareyFeb 5, 2019, 5:17 PM
65 points
5 comments4 min readLW link

Leto among the Machines

Virgil KurkjianSep 30, 2018, 9:17 PM
57 points
20 comments13 min readLW link

The Les­son To Unlearn

Ben PaceDec 8, 2019, 12:50 AM
38 points
11 comments1 min readLW link
(paulgraham.com)

“De­sign­ing agent in­cen­tives to avoid re­ward tam­per­ing”, DeepMind

gwernAug 14, 2019, 4:57 PM
28 points
15 comments1 min readLW link
(medium.com)

Lo­tuses and Loot Boxes

DavidmanheimMay 17, 2018, 12:21 AM
14 points
2 comments4 min readLW link

AISC team re­port: Soft-op­ti­miza­tion, Bayes and Goodhart

Jun 27, 2023, 6:05 AM
38 points
2 comments15 min readLW link

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Samuel RødalNov 10, 2018, 12:00 PM
24 points
6 comments1 min readLW link
(docs.google.com)

Su­per­in­tel­li­gence 12: Mal­ig­nant failure modes

KatjaGraceDec 2, 2014, 2:02 AM
15 points
51 comments5 min readLW link

When Good­hart­ing is op­ti­mal: lin­ear vs diminish­ing re­turns, un­likely vs likely, and other factors

Stuart_ArmstrongDec 19, 2019, 1:55 PM
24 points
18 comments7 min readLW link

The An­cient God Who Rules High School

lifelonglearnerApr 5, 2017, 6:55 PM
12 points
113 comments1 min readLW link
(medium.com)

Reli­gion as Goodhart

ShmiJul 8, 2019, 12:38 AM
21 points
6 comments2 min readLW link

Good­hart’s Law in Re­in­force­ment Learning

Oct 16, 2023, 12:54 AM
126 points
22 comments7 min readLW link

The re­verse Good­hart problem

Stuart_ArmstrongJun 8, 2021, 3:48 PM
20 points
22 comments1 min readLW link

The Para­dox of Ex­pert Opinion

EmrikSep 26, 2021, 9:39 PM
12 points
9 comments2 min readLW link

Why Agent Foun­da­tions? An Overly Ab­stract Explanation

johnswentworthMar 25, 2022, 11:17 PM
302 points
58 comments8 min readLW link1 review

Prac­ti­cal ev­ery­day hu­man strategizing

akaTricksterMar 27, 2022, 2:20 PM
6 points
0 comments3 min readLW link

Bayesi­anism ver­sus con­ser­vatism ver­sus Goodhart

Stuart_ArmstrongJul 16, 2021, 11:39 PM
15 points
2 comments6 min readLW link

The Dark Mir­a­cle of Optics

Suspended ReasonJun 24, 2020, 3:09 AM
27 points
5 comments8 min readLW link

Can “Re­ward Eco­nomics” solve AI Align­ment?

Q HomeSep 7, 2022, 7:58 AM
3 points
15 comments18 min readLW link

Over­sight Leagues: The Train­ing Game as a Feature

Paul BricmanSep 9, 2022, 10:08 AM
20 points
6 comments10 min readLW link

Outer al­ign­ment and imi­ta­tive amplification

evhubJan 10, 2020, 12:26 AM
24 points
11 comments9 min readLW link

Scal­ing Laws for Re­ward Model Overoptimization

Oct 20, 2022, 12:20 AM
103 points
13 comments1 min readLW link
(arxiv.org)

Re­s­olu­tions to the Challenge of Re­solv­ing Forecasts

DavidmanheimMar 11, 2021, 7:08 PM
58 points
13 comments6 min readLW link

Degamification

Nate ShowellFeb 19, 2023, 5:35 AM
23 points
2 comments2 min readLW link
No comments.