I think what you call grader-optimization is trivially about how a target diverges from the (unmeasured) true goal, which is adversarial goodhart (as defined in paper, especially how we defined Campbell’s Law, not the definition in the LW post.)
And the second paper’s taxonomy, in failure mode 3, lays out how different forms of adversarial optimization in a multi-agent scenario relate to Goodhart’s law, in both goal poisoning and optimization theft cases—and both of these seem relevant to the questions you discussed in terms of grader-optimization.
I think what you call grader-optimization is trivially about how a target diverges from the (unmeasured) true goal, which is adversarial goodhart (as defined in paper, especially how we defined Campbell’s Law, not the definition in the LW post.)
And the second paper’s taxonomy, in failure mode 3, lays out how different forms of adversarial optimization in a multi-agent scenario relate to Goodhart’s law, in both goal poisoning and optimization theft cases—and both of these seem relevant to the questions you discussed in terms of grader-optimization.