“Any measure which becomes the target ceases to be a good measure”
Examples:
Any math test supposed to find the best students will cease to work at the 10th iteration — people then “study to be good at the test”
Sugar was a good proxy for healthy food in the ancestral environment, but not today
Claim 2: If you want to condition yourself to a certain behavior with some reward, then that’s possible if only the delay between behavior and reward is small enough
Claim 3: Over time, we develop “taste”: inexplicable judgments of whether some stimulus may lead to progress toward our goals or not.
A “stimulus” can be as complex as “this specific hypothesis for how to investigate a disease”
Claim 4: Our Brains condition us, often without us noticing
With this, the article just means that dopamine spikes don’t exactly occur at the low-level reward, but already at points that predictably will lead to reward.
Since the dopamine hit itself can “feel rewarding”, this is a certain type of conditioning towards the behavior that preceded it.
In other words, the brain gives a dopamine hit in the same way as the dog trainer produces the “click” before the “treat”.
We often don’t “notice” this since we don’t usually explicitly think about why something feels good.
Conclusion: Your brain conditions you all the time toward proxy goals (“dopamine hits”), and Goodhart’s law means that conditioning is sometimes wrong
E.g., if you get an “anti-dopamine hit” for seeing the number on your bathroom scale, then this may condition you toward never looking at that number ever again, instead of the high-level goal of losing weight
Summary:
Claim 1: Goodhart’s Law is true
“Any measure which becomes the target ceases to be a good measure”
Examples:
Any math test supposed to find the best students will cease to work at the 10th iteration — people then “study to be good at the test”
Sugar was a good proxy for healthy food in the ancestral environment, but not today
Claim 2: If you want to condition yourself to a certain behavior with some reward, then that’s possible if only the delay between behavior and reward is small enough
Claim 3: Over time, we develop “taste”: inexplicable judgments of whether some stimulus may lead to progress toward our goals or not.
A “stimulus” can be as complex as “this specific hypothesis for how to investigate a disease”
Claim 4: Our Brains condition us, often without us noticing
With this, the article just means that dopamine spikes don’t exactly occur at the low-level reward, but already at points that predictably will lead to reward.
Since the dopamine hit itself can “feel rewarding”, this is a certain type of conditioning towards the behavior that preceded it.
In other words, the brain gives a dopamine hit in the same way as the dog trainer produces the “click” before the “treat”.
We often don’t “notice” this since we don’t usually explicitly think about why something feels good.
Conclusion: Your brain conditions you all the time toward proxy goals (“dopamine hits”), and Goodhart’s law means that conditioning is sometimes wrong
E.g., if you get an “anti-dopamine hit” for seeing the number on your bathroom scale, then this may condition you toward never looking at that number ever again, instead of the high-level goal of losing weight