I talked about this in terms of “underspecified goals”—often, the true goal doesn’t usually exist clearly, and may not be coherent. Until that’s fixed, the problem isn’t really Goodhart, it’s just sucking at deciding what you want.
I’m thinking of a young kid in a candy store who has $1, and wants everything, and can’t get it. What metric for choosing what to purchase will make them happy? Answer: There isn’t one. What they want is too unclear for them to be happy. So I can tell you in advance that they’re going to have a tantrum later about wanting to have done something else no matter what happens now. That’s not because they picked the wrong goal, it’s because their desires aren’t coherent.
I talked about this in terms of “underspecified goals”—often, the true goal doesn’t usually exist clearly, and may not be coherent. Until that’s fixed, the problem isn’t really Goodhart, it’s just sucking at deciding what you want.
I’m thinking of a young kid in a candy store who has $1, and wants everything, and can’t get it. What metric for choosing what to purchase will make them happy? Answer: There isn’t one. What they want is too unclear for them to be happy. So I can tell you in advance that they’re going to have a tantrum later about wanting to have done something else no matter what happens now. That’s not because they picked the wrong goal, it’s because their desires aren’t coherent.