RSS

Good­hart’s Law

TagLast edit: 19 Mar 2023 21:29 UTC by Diabloto96

Goodhart’s Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good.

Goodhart’s Law is of particular relevance to AI Alignment. Suppose you have something which is generally a good proxy for “the stuff that humans care about”, it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart’s law, the proxy will breakdown.

Goodhart Taxonomy

In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting:

See Also

Good­hart Taxonomy

Scott Garrabrant30 Dec 2017 16:38 UTC
208 points
34 comments10 min readLW link

Clas­sify­ing speci­fi­ca­tion prob­lems as var­i­ants of Good­hart’s Law

Vika19 Aug 2019 20:40 UTC
72 points
5 comments5 min readLW link1 review

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Vika3 Apr 2018 12:30 UTC
46 points
9 comments1 min readLW link2 reviews

Every­thing I ever needed to know, I learned from World of War­craft: Good­hart’s law

Said Achmiz3 May 2018 16:33 UTC
37 points
21 comments6 min readLW link1 review
(blog.obormot.net)

Re­plac­ing Karma with Good Heart To­kens (Worth $1!)

1 Apr 2022 9:31 UTC
224 points
173 comments4 min readLW link

Good­hart’s Law Causal Diagrams

11 Apr 2022 13:52 UTC
34 points
5 comments6 min readLW link

Sig­nal­ing isn’t about sig­nal­ing, it’s about Goodhart

Valentine6 Jan 2022 18:49 UTC
59 points
31 comments9 min readLW link

When is Good­hart catas­trophic?

9 May 2023 3:59 UTC
170 points
27 comments8 min readLW link

The Nat­u­ral State is Goodhart

devansh20 Mar 2023 0:00 UTC
59 points
4 comments2 min readLW link

How much do you be­lieve your re­sults?

Eric Neyman6 May 2023 20:31 UTC
467 points
14 comments15 min readLW link
(ericneyman.wordpress.com)

Good­hart’s Curse and Limi­ta­tions on AI Alignment

Gordon Seidoh Worley19 Aug 2019 7:57 UTC
25 points
18 comments10 min readLW link

The Im­por­tance of Good­hart’s Law

blogospheroid13 Mar 2010 8:19 UTC
116 points
123 comments3 min readLW link

[Question] How does Gra­di­ent Des­cent In­ter­act with Good­hart?

Scott Garrabrant2 Feb 2019 0:14 UTC
68 points
19 comments4 min readLW link

In­tro­duc­tion to Re­duc­ing Goodhart

Charlie Steiner26 Aug 2021 18:38 UTC
48 points
10 comments4 min readLW link

Good­hart Tax­on­omy: Agreement

Ben Pace1 Jul 2018 3:50 UTC
44 points
4 comments7 min readLW link

Ap­prox­i­mately Bayesian Rea­son­ing: Knigh­tian Uncer­tainty, Good­hart, and the Look-Else­where Effect

RogerDearnaley26 Jan 2024 3:58 UTC
16 points
2 comments11 min readLW link

Defeat­ing Good­hart and the “clos­est un­blocked strat­egy” problem

Stuart_Armstrong3 Apr 2019 14:46 UTC
45 points
15 comments6 min readLW link

Us­ing ex­pected util­ity for Good(hart)

Stuart_Armstrong27 Aug 2018 3:32 UTC
42 points
5 comments4 min readLW link

Does Bayes Beat Good­hart?

abramdemski3 Jun 2019 2:31 UTC
48 points
26 comments7 min readLW link

New Paper Ex­pand­ing on the Good­hart Taxonomy

Scott Garrabrant14 Mar 2018 9:01 UTC
17 points
4 comments1 min readLW link
(arxiv.org)

Catas­trophic Re­gres­sional Good­hart: Appendix

15 May 2023 0:10 UTC
25 points
1 comment9 min readLW link

Is Google Paper­clip­ping the Web? The Per­ils of Op­ti­miza­tion by Proxy in So­cial Systems

Alexandros10 May 2010 13:25 UTC
56 points
105 comments10 min readLW link

Re­quire­ments for a STEM-ca­pa­ble AGI Value Learner (my Case for Less Doom)

RogerDearnaley25 May 2023 9:26 UTC
33 points
3 comments15 min readLW link

Don’t de­sign agents which ex­ploit ad­ver­sar­ial inputs

18 Nov 2022 1:48 UTC
70 points
64 comments12 min readLW link

Is Click­bait De­stroy­ing Our Gen­eral In­tel­li­gence?

Eliezer Yudkowsky16 Nov 2018 23:06 UTC
191 points
65 comments5 min readLW link2 reviews

Hon­est sci­ence is spirituality

pchvykov1 Jul 2024 20:33 UTC
−1 points
10 comments4 min readLW link

All I know is Goodhart

Stuart_Armstrong21 Oct 2019 12:12 UTC
28 points
23 comments3 min readLW link

What does Op­ti­miza­tion Mean, Again? (Op­ti­miz­ing and Good­hart Effects—Clar­ify­ing Thoughts, Part 2)

Davidmanheim28 Jul 2019 9:30 UTC
26 points
7 comments4 min readLW link

Con­struct­ing Goodhart

johnswentworth3 Feb 2019 21:59 UTC
29 points
10 comments3 min readLW link

Bound­ing Good­hart’s Law

eric_langlois11 Jul 2018 0:46 UTC
43 points
2 comments5 min readLW link

Re­ward hack­ing and Good­hart’s law by evolu­tion­ary algorithms

Jan_Kulveit30 Mar 2018 7:57 UTC
18 points
5 comments1 min readLW link
(arxiv.org)

Non-Ad­ver­sar­ial Good­hart and AI Risks

Davidmanheim27 Mar 2018 1:39 UTC
22 points
11 comments6 min readLW link

(Some?) Pos­si­ble Multi-Agent Good­hart Interactions

Davidmanheim22 Sep 2018 17:48 UTC
20 points
2 comments5 min readLW link

Re-in­tro­duc­ing Selec­tion vs Con­trol for Op­ti­miza­tion (Op­ti­miz­ing and Good­hart Effects—Clar­ify­ing Thoughts, Part 1)

Davidmanheim2 Jul 2019 15:36 UTC
31 points
5 comments4 min readLW link

Fun­da­men­tal Uncer­tainty: Chap­ter 8 - When does fun­da­men­tal un­cer­tainty mat­ter?

Gordon Seidoh Worley26 Apr 2024 18:10 UTC
11 points
2 comments32 min readLW link

Catas­trophic Good­hart in RL with KL penalty

15 May 2024 0:58 UTC
62 points
10 comments7 min readLW link

Prin­ci­pled Satis­fic­ing To Avoid Goodhart

JenniferRM16 Aug 2024 19:05 UTC
45 points
2 comments8 min readLW link

The Dumb­ifi­ca­tion of our smart screens

Itay Dreyfus4 Jul 2024 6:32 UTC
18 points
0 comments5 min readLW link
(productidentity.co)

Embed­ded Agency (full-text ver­sion)

15 Nov 2018 19:49 UTC
201 points
17 comments54 min readLW link

Ro­bust Delegation

4 Nov 2018 16:38 UTC
116 points
10 comments1 min readLW link

Op­ti­miza­tion Amplifies

Scott Garrabrant27 Jun 2018 1:51 UTC
114 points
12 comments4 min readLW link

Speci­fi­ca­tion gam­ing: the flip side of AI ingenuity

6 May 2020 23:51 UTC
66 points
9 comments6 min readLW link

Good­hart’s Law Ex­am­ple: Train­ing Ver­ifiers to Solve Math Word Problems

Chris_Leong25 Nov 2023 0:53 UTC
27 points
2 comments1 min readLW link
(arxiv.org)

Notic­ing the Taste of Lotus

Valentine27 Apr 2018 20:05 UTC
211 points
81 comments3 min readLW link3 reviews

Guard­ing Slack vs Substance

Raemon13 Dec 2017 20:58 UTC
40 points
6 comments6 min readLW link

Hu­mans are not au­to­mat­i­cally strategic

AnnaSalamon8 Sep 2010 7:02 UTC
549 points
277 comments4 min readLW link

The Good­hart Game

John_Maxwell18 Nov 2019 23:22 UTC
13 points
5 comments5 min readLW link

If I were a well-in­ten­tioned AI… III: Ex­tremal Goodhart

Stuart_Armstrong28 Feb 2020 11:24 UTC
22 points
0 comments5 min readLW link

Op­ti­mized for Some­thing other than Win­ning or: How Cricket Re­sists Moloch and Good­hart’s Law

A.H.5 Jul 2023 12:33 UTC
53 points
26 comments4 min readLW link

nos­talge­braist: Re­cur­sive Good­hart’s Law

Kaj_Sotala26 Aug 2020 11:07 UTC
53 points
27 comments1 min readLW link
(nostalgebraist.tumblr.com)

Mar­kets are Anti-Inductive

Eliezer Yudkowsky26 Feb 2009 0:55 UTC
88 points
61 comments4 min readLW link

Satis­ficers want to be­come maximisers

Stuart_Armstrong21 Oct 2011 16:27 UTC
38 points
70 comments1 min readLW link

The Three Levels of Good­hart’s Curse

Scott Garrabrant30 Dec 2017 16:41 UTC
7 points
2 comments3 min readLW link

How my school gamed the stats

Srdjan Miletic20 Feb 2021 19:23 UTC
83 points
26 comments4 min readLW link

Boot­strapped Alignment

Gordon Seidoh Worley27 Feb 2021 15:46 UTC
20 points
12 comments2 min readLW link

Com­pe­tent Preferences

Charlie Steiner2 Sep 2021 14:26 UTC
30 points
2 comments6 min readLW link

Good­hart Ethology

Charlie Steiner17 Sep 2021 17:31 UTC
20 points
4 comments14 min readLW link

Models Model­ing Models

Charlie Steiner2 Nov 2021 7:08 UTC
23 points
5 comments10 min readLW link

My Overview of the AI Align­ment Land­scape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC
52 points
3 comments28 min readLW link

[In­tro to brain-like-AGI safety] 10. The al­ign­ment problem

Steven Byrnes30 Mar 2022 13:24 UTC
48 points
7 comments19 min readLW link

Proxy mis­speci­fi­ca­tion and the ca­pa­bil­ities vs. value learn­ing race

Sam Marks16 May 2022 18:58 UTC
23 points
3 comments4 min readLW link

Re­duc­ing Good­hart: An­nounce­ment, Ex­ec­u­tive Summary

Charlie Steiner20 Aug 2022 9:49 UTC
16 points
0 comments1 min readLW link

Align­ment al­lows “non­ro­bust” de­ci­sion-in­fluences and doesn’t re­quire ro­bust grading

TurnTrout29 Nov 2022 6:23 UTC
60 points
42 comments15 min readLW link

Don’t al­ign agents to eval­u­a­tions of plans

TurnTrout26 Nov 2022 21:16 UTC
45 points
49 comments18 min readLW link

Soft op­ti­miza­tion makes the value tar­get bigger

Jeremy Gillen2 Jan 2023 16:06 UTC
117 points
20 comments12 min readLW link

[Question] Do the Safety Prop­er­ties of Pow­er­ful AI Sys­tems Need to be Ad­ver­sar­i­ally Ro­bust? Why?

DragonGod9 Feb 2023 13:36 UTC
22 points
42 comments2 min readLW link

Could Things Be Very Differ­ent?—How His­tor­i­cal In­er­tia Might Blind Us To Op­ti­mal Solutions

James Stephen Brown11 Sep 2024 9:53 UTC
5 points
0 comments8 min readLW link
(nonzerosum.games)

Bayesi­anism ver­sus con­ser­vatism ver­sus Goodhart

Stuart_Armstrong16 Jul 2021 23:39 UTC
15 points
2 comments6 min readLW link

When Good­hart­ing is op­ti­mal: lin­ear vs diminish­ing re­turns, un­likely vs likely, and other factors

Stuart_Armstrong19 Dec 2019 13:55 UTC
24 points
18 comments7 min readLW link

The An­cient God Who Rules High School

lifelonglearner5 Apr 2017 18:55 UTC
12 points
113 comments1 min readLW link
(medium.com)

Reli­gion as Goodhart

Shmi8 Jul 2019 0:38 UTC
21 points
6 comments2 min readLW link

The Dark Mir­a­cle of Optics

Suspended Reason24 Jun 2020 3:09 UTC
27 points
5 comments8 min readLW link

Re­s­olu­tions to the Challenge of Re­solv­ing Forecasts

Davidmanheim11 Mar 2021 19:08 UTC
58 points
13 comments5 min readLW link

Good­hart’s Law in Re­in­force­ment Learning

16 Oct 2023 0:54 UTC
126 points
22 comments7 min readLW link

Ex­tinc­tion-level Good­hart’s Law as a Prop­erty of the Environment

21 Feb 2024 17:56 UTC
23 points
0 comments10 min readLW link

Can “Re­ward Eco­nomics” solve AI Align­ment?

Q Home7 Sep 2022 7:58 UTC
3 points
15 comments18 min readLW link

Over­sight Leagues: The Train­ing Game as a Feature

Paul Bricman9 Sep 2022 10:08 UTC
20 points
6 comments10 min readLW link

The re­verse Good­hart problem

Stuart_Armstrong8 Jun 2021 15:48 UTC
20 points
22 comments1 min readLW link

Dy­nam­ics Cru­cial to AI Risk Seem to Make for Com­pli­cated Models

21 Feb 2024 17:54 UTC
19 points
0 comments9 min readLW link

Outer al­ign­ment and imi­ta­tive amplification

evhub10 Jan 2020 0:26 UTC
24 points
11 comments9 min readLW link

Scal­ing Laws for Re­ward Model Overoptimization

20 Oct 2022 0:20 UTC
103 points
13 comments1 min readLW link
(arxiv.org)

The Para­dox of Ex­pert Opinion

Emrik26 Sep 2021 21:39 UTC
12 points
9 comments2 min readLW link

Val­ida­tor mod­els: A sim­ple ap­proach to de­tect­ing goodharting

beren20 Feb 2023 21:32 UTC
14 points
1 comment4 min readLW link

Think­ing about max­i­miza­tion and corrigibility

James Payor21 Apr 2023 21:22 UTC
63 points
4 comments5 min readLW link

When Can Op­ti­miza­tion Be Done Safely?

StrivingForLegibility30 Dec 2023 1:24 UTC
12 points
0 comments3 min readLW link

Ex­tinc­tion Risks from AI: In­visi­ble to Science?

21 Feb 2024 18:07 UTC
24 points
7 comments1 min readLW link
(arxiv.org)

Why Agent Foun­da­tions? An Overly Ab­stract Explanation

johnswentworth25 Mar 2022 23:17 UTC
301 points
56 comments8 min readLW link1 review

Prac­ti­cal ev­ery­day hu­man strategizing

akaTrickster27 Mar 2022 14:20 UTC
6 points
0 comments3 min readLW link

Mo­ral Mazes and Short Termism

Zvi2 Jun 2019 11:30 UTC
74 points
21 comments4 min readLW link
(thezvi.wordpress.com)

The new dot com bub­ble is here: it’s called on­line advertising

Gordon Seidoh Worley18 Nov 2019 22:05 UTC
50 points
17 comments2 min readLW link
(thecorrespondent.com)

How Doomed are Large Or­ga­ni­za­tions?

Zvi21 Jan 2020 12:20 UTC
81 points
42 comments9 min readLW link
(thezvi.wordpress.com)

When to use quantilization

RyanCarey5 Feb 2019 17:17 UTC
65 points
5 comments4 min readLW link

[Aspira­tion-based de­signs] A. Da­m­ages from mis­al­igned op­ti­miza­tion – two more models

15 Jul 2024 14:08 UTC
6 points
0 comments9 min readLW link

Leto among the Machines

Virgil Kurkjian30 Sep 2018 21:17 UTC
57 points
20 comments13 min readLW link

Weak vs Quan­ti­ta­tive Ex­tinc­tion-level Good­hart’s Law

21 Feb 2024 17:38 UTC
27 points
1 comment2 min readLW link

The Les­son To Unlearn

Ben Pace8 Dec 2019 0:50 UTC
37 points
11 comments1 min readLW link
(paulgraham.com)

Aldix and the Book of Life

ville1 Jan 2024 17:23 UTC
1 point
0 comments4 min readLW link
(medium.com)

Good­hart’s Law and Emotions

Zero Contradictions7 Jul 2024 8:32 UTC
1 point
5 comments1 min readLW link
(expandingrationality.substack.com)

“De­sign­ing agent in­cen­tives to avoid re­ward tam­per­ing”, DeepMind

gwern14 Aug 2019 16:57 UTC
28 points
15 comments1 min readLW link
(medium.com)

Lo­tuses and Loot Boxes

Davidmanheim17 May 2018 0:21 UTC
14 points
2 comments4 min readLW link

AISC team re­port: Soft-op­ti­miza­tion, Bayes and Goodhart

27 Jun 2023 6:05 UTC
37 points
2 comments15 min readLW link

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Samuel Rødal10 Nov 2018 12:00 UTC
24 points
6 comments1 min readLW link
(docs.google.com)

Su­per­in­tel­li­gence 12: Mal­ig­nant failure modes

KatjaGrace2 Dec 2014 2:02 UTC
15 points
51 comments5 min readLW link

Degamification

Nate Showell19 Feb 2023 5:35 UTC
23 points
2 comments2 min readLW link
No comments.