I agree this is a weird place to bring up Goodhart that requires extra justification. But I do think it makes sense here. (Though I also agree with Said elsethread that it matters a lot what we’re actually talking about – rabbits, widgets, companies, scientific papers and blogposts might all behave a bit differently)
The two main issues are:
it’s hard to directly incentivize results with fuzzy, unpredictable characteristics.
it’s hard to directly incentivize results over long timescales
In short: preoccupation with “rewarding results”, in situations where you can’t actually reward results, can result in goodharting for all the usual reasons.
Two examples here are:
Scientific papers. Probably the closest to a directly relevant example before we start talking about blogposts in particular. My impression [epistemic status: relatively weak, based on anecodotes, but it seems like everyone I’ve heard talk about this roughly agreed with these anecdotes] is that the publish-or-perish mindset for academic output has been a pretty direct example of “we tried directly incentivizing results, and instead of getting more science we got shittier science.”
Founding Companies. There are ecosystems for founding and investing in companies (startups and otherwise), which are ultimately about a particular result (making money). But, this requires very long time horizons, which many people a) literally can’t pull off because they don’t have the runway, b) are often risk averse, and might not be willing to do it if they had to just risk their own money.
The business venture world works because there’s been a lot of infrastructural effort put into enabling particular strategies (in the case of startups, an established concept of seed funding, series A, etc). In the case of business sometimes more straightforward loans.
The relation to goodhart here is a bit weirder here because yeah, overfocus on “known strategies” is also one of the pathologies that results in goodharting (i.e. everything thinks social media is Hype, so everyone founds social media companies, but maybe by this point social media is overdone and you actually need to be looking for weirder things that people haven’t already saturated the market with)
But, the goodhart is instead “if you don’t put effort into maintaining the early stages of the strategy, despite many instances of that strategy failing… you just end up with less money.”
My sense [again, epistemic status fairly weak based on things Paul Graham said, but that I haven’t heard explicitly argued against] is venture capitalists make the most money from the long tail of companies they invest in. Being willing to get “real results” requires being willing to tolerate lots of things that don’t pay off in results. And many of those companies in the long tail were startup ideas that didn’t sound like sure bets.
There is some sense in which “directly rewarding results” is of course the best way to avoid goodharting, but since we don’t actually have access to “direct results that actually represent the real thing” to reward, the impulse to directly reward results can often result in rewarding not-actually-results.
Sure, that all makes sense, but at least on LW it seems like we ought to insist on saying “rewarding results” when we mean rewarding results, and “deceiving ourselves into thinking we’re rewarding results” when we mean deceiving ourselves into thinking we’re rewarding results.
That makes sense, although I’m not actually sure either “rewarding results” or “deceiving ourselves into thinking we’re rewarding results” quite capture what’s going on here.
Like, I do think it’s possible to reward individual good things (whether blogposts or scientific papers) when you find them. The question is how this shapes the overall system. When you expect “good/real results” to be few and far between, the process of “only reward things that are obvious good and/or great” might technically be rewarding results, while still outputting fewer results on average than if you had rewarded people for following overall strategies like “pursue things you’re earnestly curious about”, and giving people positive rewards for incremental steps along the way.
(Seems good to be precise about language here but I’m not in fact sure how to word this optimally. Meanwhile, earlier parts of the conversation were more explicitly about how ‘reward final results, and only final results’ just isn’t the strategy used in most of the business world)
I agree this is a weird place to bring up Goodhart that requires extra justification. But I do think it makes sense here. (Though I also agree with Said elsethread that it matters a lot what we’re actually talking about – rabbits, widgets, companies, scientific papers and blogposts might all behave a bit differently)
The two main issues are:
it’s hard to directly incentivize results with fuzzy, unpredictable characteristics.
it’s hard to directly incentivize results over long timescales
In short: preoccupation with “rewarding results”, in situations where you can’t actually reward results, can result in goodharting for all the usual reasons.
Two examples here are:
Scientific papers. Probably the closest to a directly relevant example before we start talking about blogposts in particular. My impression [epistemic status: relatively weak, based on anecodotes, but it seems like everyone I’ve heard talk about this roughly agreed with these anecdotes] is that the publish-or-perish mindset for academic output has been a pretty direct example of “we tried directly incentivizing results, and instead of getting more science we got shittier science.”
Founding Companies. There are ecosystems for founding and investing in companies (startups and otherwise), which are ultimately about a particular result (making money). But, this requires very long time horizons, which many people a) literally can’t pull off because they don’t have the runway, b) are often risk averse, and might not be willing to do it if they had to just risk their own money.
The business venture world works because there’s been a lot of infrastructural effort put into enabling particular strategies (in the case of startups, an established concept of seed funding, series A, etc). In the case of business sometimes more straightforward loans.
The relation to goodhart here is a bit weirder here because yeah, overfocus on “known strategies” is also one of the pathologies that results in goodharting (i.e. everything thinks social media is Hype, so everyone founds social media companies, but maybe by this point social media is overdone and you actually need to be looking for weirder things that people haven’t already saturated the market with)
But, the goodhart is instead “if you don’t put effort into maintaining the early stages of the strategy, despite many instances of that strategy failing… you just end up with less money.”
My sense [again, epistemic status fairly weak based on things Paul Graham said, but that I haven’t heard explicitly argued against] is venture capitalists make the most money from the long tail of companies they invest in. Being willing to get “real results” requires being willing to tolerate lots of things that don’t pay off in results. And many of those companies in the long tail were startup ideas that didn’t sound like sure bets.
There is some sense in which “directly rewarding results” is of course the best way to avoid goodharting, but since we don’t actually have access to “direct results that actually represent the real thing” to reward, the impulse to directly reward results can often result in rewarding not-actually-results.
Sure, that all makes sense, but at least on LW it seems like we ought to insist on saying “rewarding results” when we mean rewarding results, and “deceiving ourselves into thinking we’re rewarding results” when we mean deceiving ourselves into thinking we’re rewarding results.
That makes sense, although I’m not actually sure either “rewarding results” or “deceiving ourselves into thinking we’re rewarding results” quite capture what’s going on here.
Like, I do think it’s possible to reward individual good things (whether blogposts or scientific papers) when you find them. The question is how this shapes the overall system. When you expect “good/real results” to be few and far between, the process of “only reward things that are obvious good and/or great” might technically be rewarding results, while still outputting fewer results on average than if you had rewarded people for following overall strategies like “pursue things you’re earnestly curious about”, and giving people positive rewards for incremental steps along the way.
(Seems good to be precise about language here but I’m not in fact sure how to word this optimally. Meanwhile, earlier parts of the conversation were more explicitly about how ‘reward final results, and only final results’ just isn’t the strategy used in most of the business world)