The problem is that “Goodhart’s curse” is an informal statement. It doesn’t say literally “for all |u—v| > 0 optimation for v leads to oblivion of u”. When we talk about “small differences”, we talk about “differences in the space of all possible minds”, where difference between two humans is practically nonexistent. If you, say, find subset of utility functions V, such that |v—u| < 10^(-32) utilon for all v in V, where u—humanity utility function, you should implement it right now in Sovereign, because, yes, we lose some utility, but we have time limit for solving alignment. The problem of alignment is that we can’t specify V with such characteristics. We can’t specify V even such that corr(u, v) > 0.5.
The problem is that “Goodhart’s curse” is an informal statement. It doesn’t say literally “for all |u—v| > 0 optimation for v leads to oblivion of u”. When we talk about “small differences”, we talk about “differences in the space of all possible minds”, where difference between two humans is practically nonexistent. If you, say, find subset of utility functions V, such that |v—u| < 10^(-32) utilon for all v in V, where u—humanity utility function, you should implement it right now in Sovereign, because, yes, we lose some utility, but we have time limit for solving alignment. The problem of alignment is that we can’t specify V with such characteristics. We can’t specify V even such that corr(u, v) > 0.5.