I think it may be enlightening to consider what happens if you get super-lucky and your first Model is perfectly accurate. I think what happens is that you end up solving the same problem as you would without the process proposed here and in the previous article, and running into the same problems.
It’s possible that doing everything via an inaccurate model may help (by reducing overfitting, in some sense) but I don’t think it’s obvious that it does (since it will increase underfitting, in the same handwavy sense) and my guess is that it will help and hurt in different cases, and that it will be difficult to guess ahead of time which it will be.
Overfitting is the same thing as goodhart’s law. If using an inaccurate model helps, it’s because not trying too hard is necessary to avoid goodharting yourself.
I do not think that overfitting is “the same thing as” Goodhart’s law. I think Goodhart’s law is more broad. One of the mechanisms by which it works is similar to overfitting, but there is a lot more to Goodhart’s law than overfitting. In particular, I think the standard examples of Goodhart’s law include adversaries that are trying to break your proxy in a way that does not show up in overfitting. See also https://agentfoundations.org/item?id=1621
If your model is perfectly accurate, yep this process is not needed.
I’d argue though if you had a perfect Model and a Model of the Measure, you wouldn’t need the real Measure either. The Measure is just something to help you search. You would just create the good life or an AI or whatever hard to define thing you have a definition for.
My point isn’t that if the model is perfectly accurate the process isn’t needed.
My point is that if the model is perfectly accurate the process doesn’t work (at least in the difficult cases) and that the process involves trying to improve the model all the time, so it’s liable to push itself into a situation where it doesn’t work.
In other words, I don’t see that this process is really effective in saving you from Goodhart’s law.
I think it may be enlightening to consider what happens if you get super-lucky and your first Model is perfectly accurate. I think what happens is that you end up solving the same problem as you would without the process proposed here and in the previous article, and running into the same problems.
It’s possible that doing everything via an inaccurate model may help (by reducing overfitting, in some sense) but I don’t think it’s obvious that it does (since it will increase underfitting, in the same handwavy sense) and my guess is that it will help and hurt in different cases, and that it will be difficult to guess ahead of time which it will be.
Overfitting is the same thing as goodhart’s law. If using an inaccurate model helps, it’s because not trying too hard is necessary to avoid goodharting yourself.
That’s … pretty much what I thought I was saying. Was I unclear, or have I misunderstood you somehow?
Ah, I thought “in some sense” meant you weren’t sure if you were using the metaphor correctly.
I do not think that overfitting is “the same thing as” Goodhart’s law. I think Goodhart’s law is more broad. One of the mechanisms by which it works is similar to overfitting, but there is a lot more to Goodhart’s law than overfitting. In particular, I think the standard examples of Goodhart’s law include adversaries that are trying to break your proxy in a way that does not show up in overfitting. See also https://agentfoundations.org/item?id=1621
If your model is perfectly accurate, yep this process is not needed.
I’d argue though if you had a perfect Model and a Model of the Measure, you wouldn’t need the real Measure either. The Measure is just something to help you search. You would just create the good life or an AI or whatever hard to define thing you have a definition for.
My point isn’t that if the model is perfectly accurate the process isn’t needed.
My point is that if the model is perfectly accurate the process doesn’t work (at least in the difficult cases) and that the process involves trying to improve the model all the time, so it’s liable to push itself into a situation where it doesn’t work.
In other words, I don’t see that this process is really effective in saving you from Goodhart’s law.