Goodhart’s Law

TagLast edit: Mar 19, 2023, 9:29 PM by Diabloto96

Goodhart’s Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good.

Goodhart’s Law is of particular relevance to AI Alignment. Suppose you have something which is generally a good proxy for “the stuff that humans care about”, it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart’s law, the proxy will breakdown.

Goodhart Taxonomy

In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting:

Regressional Goodhart—When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.
Causal Goodhart—When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.
Extremal Goodhart—Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed.
Adversarial Goodhart—When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.

Goodhart Taxonomy

Scott GarrabrantDec 30, 2017, 4:38 PM

214 points

34 comments10 min readLW link

Classifying specification problems as variants of Goodhart’s Law

VikaAug 19, 2019, 8:40 PM

72 points

5 comments5 min readLW link 1 review

Specification gaming examples in AI

VikaApr 3, 2018, 12:30 PM

47 points

9 comments1 min readLW link 2 reviews

Everything I ever needed to know, I learned from World of Warcraft: Goodhart’s law

Said AchmizMay 3, 2018, 4:33 PM

37 points

21 comments6 min readLW link 1 review

(blog.obormot.net)

Replacing Karma with Good Heart Tokens (Worth $1!)

Ben Pace and habryka

Apr 1, 2022, 9:31 AM

225 points

173 comments4 min readLW link

Signaling isn’t about signaling, it’s about Goodhart

ValentineJan 6, 2022, 6:49 PM

59 points

31 comments9 min readLW link

Goodhart’s Law Causal Diagrams

JustinShovelain and Jeremy Gillen

Apr 11, 2022, 1:52 PM

34 points

6 comments6 min readLW link

The Natural State is Goodhart

devanshMar 20, 2023, 12:00 AM

59 points

4 comments2 min readLW link

How much do you believe your results?

Eric NeymanMay 6, 2023, 8:31 PM

505 points

18 comments15 min readLW link 4 reviews

(ericneyman.wordpress.com)

When is Goodhart catastrophic?

Drake Thomas and Thomas Kwa

May 9, 2023, 3:59 AM

180 points

29 comments8 min readLW link 1 review

Goodhart’s Curse and Limitations on AI Alignment

Gordon Seidoh WorleyAug 19, 2019, 7:57 AM

25 points

18 comments10 min readLW link

The Importance of Goodhart’s Law

blogospheroidMar 13, 2010, 8:19 AM

117 points

123 comments3 min readLW link

Introduction to Reducing Goodhart

Charlie SteinerAug 26, 2021, 6:38 PM

48 points

10 comments4 min readLW link

[Question] How does Gradient Descent Interact with Goodhart?

Scott GarrabrantFeb 2, 2019, 12:14 AM

68 points

19 comments4 min readLW link

Goodhart Taxonomy: Agreement

Ben PaceJul 1, 2018, 3:50 AM

44 points

4 comments7 min readLW link

Some implications of radical empathy

MichaelStJulesJan 7, 2025, 4:10 PM

3 points

0 comments1 min readLW link

Really radical empathy

MichaelStJulesJan 6, 2025, 5:46 PM

19 points

0 comments1 min readLW link

Utilitarianism and the replaceability of desires and attachments

MichaelStJulesJul 27, 2024, 1:57 AM

5 points

2 comments1 min readLW link

Actualism, asymmetry and extinction

MichaelStJulesJan 7, 2025, 4:02 PM

1 point

4 comments1 min readLW link

Approximately Bayesian Reasoning: Knightian Uncertainty, Goodhart, and the Look-Elsewhere Effect

RogerDearnaleyJan 26, 2024, 3:58 AM

16 points

2 comments11 min readLW link

Does Bayes Beat Goodhart?

abramdemskiJun 3, 2019, 2:31 AM

48 points

26 comments7 min readLW link

New Paper Expanding on the Goodhart Taxonomy

Scott GarrabrantMar 14, 2018, 9:01 AM

17 points

4 comments1 min readLW link

(arxiv.org)

Defeating Goodhart and the “closest unblocked strategy” problem

Stuart_ArmstrongApr 3, 2019, 2:46 PM

45 points

15 comments6 min readLW link

Is Google Paperclipping the Web? The Perils of Optimization by Proxy in Social Systems

AlexandrosMay 10, 2010, 1:25 PM

56 points

105 comments10 min readLW link

Catastrophic Regressional Goodhart: Appendix

Thomas Kwa and Drake Thomas

May 15, 2023, 12:10 AM

25 points

1 comment9 min readLW link

Using expected utility for Good(hart)

Stuart_ArmstrongAug 27, 2018, 3:32 AM

42 points

5 comments4 min readLW link

Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom)

RogerDearnaleyMay 25, 2023, 9:26 AM

33 points

3 comments15 min readLW link

Is Clickbait Destroying Our General Intelligence?

Eliezer YudkowskyNov 16, 2018, 11:06 PM

191 points

65 comments5 min readLW link 2 reviews

Don’t design agents which exploit adversarial inputs

TurnTrout and Garrett Baker

Nov 18, 2022, 1:48 AM

72 points

64 comments12 min readLW link

Embedded Agency (full-text version)

Scott Garrabrant and abramdemski

Nov 15, 2018, 7:49 PM

201 points

17 comments54 min readLW link

Principled Satisficing To Avoid Goodhart

JenniferRMAug 16, 2024, 7:05 PM

45 points

2 comments8 min readLW link

Robust Delegation

abramdemski and Scott Garrabrant

Nov 4, 2018, 4:38 PM

116 points

10 comments1 min readLW link

Optimization Amplifies

Scott GarrabrantJun 27, 2018, 1:51 AM

114 points

12 comments4 min readLW link

Specification gaming: the flip side of AI ingenuity

Vika, Vlad Mikulik, Matthew Rahtz, tom4everitt, Zac Kenton and janleike

May 6, 2020, 11:51 PM

66 points

9 comments6 min readLW link

Goodhart’s Law Example: Training Verifiers to Solve Math Word Problems

Chris_LeongNov 25, 2023, 12:53 AM

27 points

2 comments1 min readLW link

(arxiv.org)

Noticing the Taste of Lotus

ValentineApr 27, 2018, 8:05 PM

218 points

81 comments3 min readLW link 3 reviews

Guarding Slack vs Substance

RaemonDec 13, 2017, 8:58 PM

42 points

6 comments6 min readLW link

Humans are not automatically strategic

AnnaSalamonSep 8, 2010, 7:02 AM

581 points

278 comments4 min readLW link

(Some?) Possible Multi-Agent Goodhart Interactions

DavidmanheimSep 22, 2018, 5:48 PM

20 points

2 comments5 min readLW link

Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects—Clarifying Thoughts, Part 1)

DavidmanheimJul 2, 2019, 3:36 PM

31 points

5 comments4 min readLW link

Fundamental Uncertainty: Chapter 8 - When does fundamental uncertainty matter?

Gordon Seidoh WorleyApr 26, 2024, 6:10 PM

11 points

2 comments32 min readLW link

Catastrophic Goodhart in RL with KL penalty

Thomas Kwa and Adrià Garriga-alonso

May 15, 2024, 12:58 AM

62 points

10 comments7 min readLW link

The Goodhart Game

John_MaxwellNov 18, 2019, 11:22 PM

13 points

5 comments5 min readLW link

Honest science is spirituality

pchvykovJul 1, 2024, 8:33 PM

−1 points

10 comments4 min readLW link

If I were a well-intentioned AI… III: Extremal Goodhart

Stuart_ArmstrongFeb 28, 2020, 11:24 AM

22 points

0 comments5 min readLW link

The Dumbification of our smart screens

Itay DreyfusJul 4, 2024, 6:32 AM

18 points

0 comments5 min readLW link

(productidentity.co)

All I know is Goodhart

Stuart_ArmstrongOct 21, 2019, 12:12 PM

28 points

23 comments3 min readLW link

What does Optimization Mean, Again? (Optimizing and Goodhart Effects—Clarifying Thoughts, Part 2)

DavidmanheimJul 28, 2019, 9:30 AM

26 points

7 comments4 min readLW link

Optimized for Something other than Winning or: How Cricket Resists Moloch and Goodhart’s Law

A.H.Jul 5, 2023, 12:33 PM

53 points

26 comments4 min readLW link

nostalgebraist: Recursive Goodhart’s Law

Kaj_SotalaAug 26, 2020, 11:07 AM

53 points

27 comments1 min readLW link

(nostalgebraist.tumblr.com)

Markets are Anti-Inductive

Eliezer YudkowskyFeb 26, 2009, 12:55 AM

90 points

62 comments4 min readLW link

Constructing Goodhart

johnswentworthFeb 3, 2019, 9:59 PM

29 points

10 comments3 min readLW link

Bounding Goodhart’s Law

eric_langloisJul 11, 2018, 12:46 AM

43 points

2 comments5 min readLW link

Satisficers want to become maximisers

Stuart_ArmstrongOct 21, 2011, 4:27 PM

38 points

70 comments1 min readLW link

The Three Levels of Goodhart’s Curse

Scott GarrabrantDec 30, 2017, 4:41 PM

7 points

2 comments3 min readLW link

How my school gamed the stats

Srdjan MileticFeb 20, 2021, 7:23 PM

83 points

26 comments4 min readLW link

Bootstrapped Alignment

Gordon Seidoh WorleyFeb 27, 2021, 3:46 PM

20 points

12 comments2 min readLW link

Competent Preferences

Charlie SteinerSep 2, 2021, 2:26 PM

30 points

2 comments6 min readLW link

Goodhart Ethology

Charlie SteinerSep 17, 2021, 5:31 PM

20 points

4 comments14 min readLW link

Models Modeling Models

Charlie SteinerNov 2, 2021, 7:08 AM

23 points

5 comments10 min readLW link

My Overview of the AI Alignment Landscape: Threat Models

Neel NandaDec 25, 2021, 11:07 PM

53 points

3 comments28 min readLW link

[Intro to brain-like-AGI safety] 10. The alignment problem

Steven ByrnesMar 30, 2022, 1:24 PM

48 points

7 comments21 min readLW link

Proxy misspecification and the capabilities vs. value learning race

Sam MarksMay 16, 2022, 6:58 PM

23 points

3 comments4 min readLW link

Detect Goodhart and shut down

Jeremy GillenJan 22, 2025, 6:45 PM

68 points

21 comments7 min readLW link

Reducing Goodhart: Announcement, Executive Summary

Charlie SteinerAug 20, 2022, 9:49 AM

16 points

0 comments1 min readLW link

Goodhart Typology via Structure, Function, and Randomness Distributions

JustinShovelain and Mateusz Bagiński

Mar 25, 2025, 4:01 PM

32 points

0 comments15 min readLW link

Reward hacking and Goodhart’s law by evolutionary algorithms

Jan_KulveitMar 30, 2018, 7:57 AM

18 points

5 comments1 min readLW link

(arxiv.org)

Non-Adversarial Goodhart and AI Risks

DavidmanheimMar 27, 2018, 1:39 AM

22 points

11 comments6 min readLW link

Alignment allows “nonrobust” decision-influences and doesn’t require robust grading

TurnTroutNov 29, 2022, 6:23 AM

62 points

41 comments15 min readLW link

Don’t align agents to evaluations of plans

TurnTroutNov 26, 2022, 9:16 PM

48 points

49 comments18 min readLW link

Soft optimization makes the value target bigger

Jeremy GillenJan 2, 2023, 4:06 PM

119 points

20 comments12 min readLW link

[Question] Do the Safety Properties of Powerful AI Systems Need to be Adversarially Robust? Why?

DragonGodFeb 9, 2023, 1:36 PM

22 points

42 comments2 min readLW link

Validator models: A simple approach to detecting goodharting

berenFeb 20, 2023, 9:32 PM

14 points

1 comment4 min readLW link

Weak vs Quantitative Extinction-level Goodhart’s Law

VojtaKovarik and Ida Mattsson

Feb 21, 2024, 5:38 PM

27 points

1 comment2 min readLW link

When Can Optimization Be Done Safely?

StrivingForLegibilityDec 30, 2023, 1:24 AM

12 points

0 comments3 min readLW link

Aldix and the Book of Life

villeJan 1, 2024, 5:23 PM

1 point

0 comments4 min readLW link

(medium.com)

Atlas: Stress-Testing ASI Value Learning Through Grand Strategy Scenarios

NeilFoxFeb 17, 2025, 11:55 PM

1 point

0 comments2 min readLW link

Extinction Risks from AI: Invisible to Science?

VojtaKovarik, Chris van Merwijk and Ida Mattsson

Feb 21, 2024, 6:07 PM

24 points

7 comments1 min readLW link

(arxiv.org)

Dynamics Crucial to AI Risk Seem to Make for Complicated Models

VojtaKovarik and Ida Mattsson

Feb 21, 2024, 5:54 PM

19 points

0 comments9 min readLW link

Extinction-level Goodhart’s Law as a Property of the Environment

VojtaKovarik and Ida Mattsson

Feb 21, 2024, 5:56 PM

23 points

0 comments10 min readLW link

Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions

James Stephen BrownSep 11, 2024, 9:53 AM

5 points

0 comments8 min readLW link

(nonzerosum.games)

Goodhart’s Law and Emotions

Zero ContradictionsJul 7, 2024, 8:32 AM

1 point

5 comments1 min readLW link

(expandingrationality.substack.com)

[Aspiration-based designs] A. Damages from misaligned optimization – two more models

Jobst Heitzig and Simon Dima

Jul 15, 2024, 2:08 PM

6 points

0 comments9 min readLW link

Don’t want Goodhart? — Specify the damn variables

Yan LyutnevNov 21, 2024, 10:45 PM

−3 points

2 comments5 min readLW link

Don’t want Goodhart? — Specify the variables more

YanLyutnevNov 21, 2024, 10:43 PM

3 points

2 comments5 min readLW link

Other Papers About the Theory of Reward Learning

Joar SkalseFeb 28, 2025, 7:26 PM

16 points

0 comments5 min readLW link

Defining and Characterising Reward Hacking

Joar SkalseFeb 28, 2025, 7:25 PM

15 points

0 comments4 min readLW link

Visual demonstration of Optimizer’s curse

Roman MalovNov 30, 2024, 7:34 PM

25 points

3 comments7 min readLW link

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well)

Roland PihlakasJan 12, 2025, 3:37 AM

46 points

7 comments10 min readLW link

Thinking about maximization and corrigibility

James PayorApr 21, 2023, 9:22 PM

63 points

4 comments5 min readLW link

Moral Mazes and Short Termism

ZviJun 2, 2019, 11:30 AM

74 points

21 comments4 min readLW link

(thezvi.wordpress.com)

The new dot com bubble is here: it’s called online advertising

Gordon Seidoh WorleyNov 18, 2019, 10:05 PM

50 points

17 comments2 min readLW link

(thecorrespondent.com)

How Doomed are Large Organizations?

ZviJan 21, 2020, 12:20 PM

81 points

42 comments9 min readLW link

(thezvi.wordpress.com)

When to use quantilization

RyanCareyFeb 5, 2019, 5:17 PM

65 points

5 comments4 min readLW link

Leto among the Machines

Virgil KurkjianSep 30, 2018, 9:17 PM

57 points

20 comments13 min readLW link

The Lesson To Unlearn

Ben PaceDec 8, 2019, 12:50 AM

38 points

11 comments1 min readLW link

(paulgraham.com)

“Designing agent incentives to avoid reward tampering”, DeepMind

gwernAug 14, 2019, 4:57 PM

28 points

15 comments1 min readLW link

(medium.com)

Lotuses and Loot Boxes

DavidmanheimMay 17, 2018, 12:21 AM

14 points

2 comments4 min readLW link

AISC team report: Soft-optimization, Bayes and Goodhart

Simon Fischer, benjaminko, jazcarretao, DFNaiff and Jeremy Gillen

Jun 27, 2023, 6:05 AM

38 points

2 comments15 min readLW link

Specification gaming examples in AI

Samuel RødalNov 10, 2018, 12:00 PM

24 points

6 comments1 min readLW link

(docs.google.com)

Superintelligence 12: Malignant failure modes

KatjaGraceDec 2, 2014, 2:02 AM

15 points

51 comments5 min readLW link

When Goodharting is optimal: linear vs diminishing returns, unlikely vs likely, and other factors

Stuart_ArmstrongDec 19, 2019, 1:55 PM

24 points

18 comments7 min readLW link

The Ancient God Who Rules High School

lifelonglearnerApr 5, 2017, 6:55 PM

12 points

113 comments1 min readLW link

(medium.com)

Religion as Goodhart

ShmiJul 8, 2019, 12:38 AM

21 points

6 comments2 min readLW link

Goodhart’s Law in Reinforcement Learning

jacek, Joar Skalse, OliverHayman, charlie_griffin and Xingjian Bai

Oct 16, 2023, 12:54 AM

126 points

22 comments7 min readLW link

The reverse Goodhart problem

Stuart_ArmstrongJun 8, 2021, 3:48 PM

20 points

22 comments1 min readLW link

The Paradox of Expert Opinion

EmrikSep 26, 2021, 9:39 PM

12 points

9 comments2 min readLW link

Why Agent Foundations? An Overly Abstract Explanation

johnswentworthMar 25, 2022, 11:17 PM

302 points

58 comments8 min readLW link 1 review

Practical everyday human strategizing

akaTricksterMar 27, 2022, 2:20 PM

6 points

0 comments3 min readLW link

Bayesianism versus conservatism versus Goodhart

Stuart_ArmstrongJul 16, 2021, 11:39 PM

15 points

2 comments6 min readLW link

The Dark Miracle of Optics

Suspended ReasonJun 24, 2020, 3:09 AM

27 points

5 comments8 min readLW link

Can “Reward Economics” solve AI Alignment?

Q HomeSep 7, 2022, 7:58 AM

3 points

15 comments18 min readLW link

Oversight Leagues: The Training Game as a Feature

Paul BricmanSep 9, 2022, 10:08 AM

20 points

6 comments10 min readLW link

Outer alignment and imitative amplification

evhubJan 10, 2020, 12:26 AM

24 points

11 comments9 min readLW link

Scaling Laws for Reward Model Overoptimization

leogao, John Schulman and Jacob_Hilton

Oct 20, 2022, 12:20 AM

103 points

13 comments1 min readLW link

(arxiv.org)

Resolutions to the Challenge of Resolving Forecasts

DavidmanheimMar 11, 2021, 7:08 PM

58 points

13 comments6 min readLW link

Degamification

Nate ShowellFeb 19, 2023, 5:35 AM

23 points

2 comments2 min readLW link

No comments.

Good­hart’s Law

Goodhart Taxonomy

See Also

Goodhart’s Law