It doesn’t look circular to me? I’m not assuming that we get Goodhart, just that properties that result in very high X seem like they would be things like “very rhetorically persuasive” or “tricks the human into typing a very large number into the rating box” that won’t affect V much, rather than properties with very high magnitude towards both X and V. I believe this less for V, so we’ll probably have to replace independence with this.
I think you’re splitting hairs. We prove Goodhart follows from certain assumptions, and I’ve given some justification for the assumptions as well as their limitations, so you could equally say that we “prove” or “show an example”. If by circular you mean we proved something about independent X and V because this was easier than more realistic assumptions, we’re guilty! The proof was a huge pain and we wanted to publish rather than overcomplicating it more, partly to get feedback like yours. But I do have some intuition that the result is useful, partly because things are sometimes approximately independent, and partly because the basic reasons behind the proof extend to other cases.
It doesn’t look circular to me? I’m not assuming that we get Goodhart, just that properties that result in very high X seem like they would be things like “very rhetorically persuasive” or “tricks the human into typing a very large number into the rating box” that won’t affect V much, rather than properties with very high magnitude towards both X and V. I believe this less for V, so we’ll probably have to replace independence with this.
I think you’re splitting hairs. We prove Goodhart follows from certain assumptions, and I’ve given some justification for the assumptions as well as their limitations, so you could equally say that we “prove” or “show an example”. If by circular you mean we proved something about independent X and V because this was easier than more realistic assumptions, we’re guilty! The proof was a huge pain and we wanted to publish rather than overcomplicating it more, partly to get feedback like yours. But I do have some intuition that the result is useful, partly because things are sometimes approximately independent, and partly because the basic reasons behind the proof extend to other cases.