In this situation Goodhart is basically open-loop optimization. An EE analogy would be a high gain op amp with no feedback circuit. The result is predictable: you end up optimized out of the linear mode and into saturation.
You can’t explicitly optimize for something you don’t know. And you don’t know what you really want. You might think you do, but, as usual, beware what you wish for. I don’t know if an AI can form a reasonable terminal goal to optimize, but humans surely cannot. Given that some 90% of our brain/mind is not available to introspection, all we have to go by is the vague feeling of “this feels right” or “this is fishy but I cannot put my finger on why”. That’s why cautiously iterating with periodic feedback is so essential, and open-loop optimization is bound to get you to all the wrong places.
In this situation Goodhart is basically open-loop optimization. An EE analogy would be a high gain op amp with no feedback circuit. The result is predictable: you end up optimized out of the linear mode and into saturation.
You can’t explicitly optimize for something you don’t know. And you don’t know what you really want. You might think you do, but, as usual, beware what you wish for. I don’t know if an AI can form a reasonable terminal goal to optimize, but humans surely cannot. Given that some 90% of our brain/mind is not available to introspection, all we have to go by is the vague feeling of “this feels right” or “this is fishy but I cannot put my finger on why”. That’s why cautiously iterating with periodic feedback is so essential, and open-loop optimization is bound to get you to all the wrong places.