The problem comes when one tries to pour a lot of money into that sort of approach
It seems to me that the Goodhart effect is actually stronger if you’re granting less money.
Suppose that we have a population of people who are keen to work on AI safety. Suppose every time a person from that population gets an application for funding rejected, they lose a bit of the idealism which initially drew them to the area and they start having a few more cynical thoughts like “my guess is that grantmakers want to fund X, maybe I should try to be more like X even though I don’t personally think X is a great idea.”
In that case, the level of Goodharting seems to be pretty much directly proportional to the number of rejections—and the less funding available, the greater the quantity of rejections.
On the other hand, if the United Nations got together tomorrow and decided to fund a worldwide UBI, there’d be no optimization pressure at all, and people would just do whatever seemed best to them personally.
EDIT: This appears to be a concrete example of what I’m describing
It seems to me that the Goodhart effect is actually stronger if you’re granting less money.
Suppose that we have a population of people who are keen to work on AI safety. Suppose every time a person from that population gets an application for funding rejected, they lose a bit of the idealism which initially drew them to the area and they start having a few more cynical thoughts like “my guess is that grantmakers want to fund X, maybe I should try to be more like X even though I don’t personally think X is a great idea.”
In that case, the level of Goodharting seems to be pretty much directly proportional to the number of rejections—and the less funding available, the greater the quantity of rejections.
On the other hand, if the United Nations got together tomorrow and decided to fund a worldwide UBI, there’d be no optimization pressure at all, and people would just do whatever seemed best to them personally.
EDIT: This appears to be a concrete example of what I’m describing