What you say is true, but it’s a reduction of the problem to be less bad by applying weaker optimization pressure rather than an actual elimination of the problem. Weak Goodharting is still Goodharting and it will still, eventually, subtly screw you up.
I think all self improvement is subject to Goodharting, even the type you recommend.
The best things available to us to do about that:
Be nimble and self-aware. Adjust your processes to notice when you’re harming yourself.
Be thoughtful in how you measure success.
I do not think this is actually a contradiction to your post, but, at least for me, it seems like a more actionable framing of the issue.
I think all self improvement is subject to Goodharting, even the type you recommend.
In particular, I’d worry that “not Goodharting yourself” is Goodharting yourself. Dunno that I have this very coherently, but things that feel like hooks:
Selling nonapples.
Don’t try to “get better at rationality”; rationality needs to have a goal outside itself.
“How are you doing at your goals?” “Well, I stopped measuring things that weren’t perfect metrics for them.”
I think all self improvement is subject to Goodharting, even the type you recommend.
The best things available to us to do about that:
Be nimble and self-aware. Adjust your processes to notice when you’re harming yourself.
Be thoughtful in how you measure success.
I do not think this is actually a contradiction to your post, but, at least for me, it seems like a more actionable framing of the issue.
In particular, I’d worry that “not Goodharting yourself” is Goodharting yourself. Dunno that I have this very coherently, but things that feel like hooks:
Selling nonapples.
Don’t try to “get better at rationality”; rationality needs to have a goal outside itself.
“How are you doing at your goals?” “Well, I stopped measuring things that weren’t perfect metrics for them.”