If we assume that the questions are designed such that a student can answer them upon initial exposure if and only if they deeply understand the material, then the question of identifying graders turns into the much easier question of identifying people who can discriminate between valid and invalid answers. I’m told that being able to discriminate between valid and invalid responses is a necessary condition for subject expertise, so anyone who’s a relevant expert works. One way to demonstrate expertise is by building something that requires expertise. In an extreme example, I’m confident that Grigori Perelman understands topology because he proved the Poincare conjecture, and, for similar reasons, I’m (mostly) confident that Ph.Ds are experts. If we have well-designed tests, we can set the set of people qualified to grade tests as “has built something requiring expertise or has passed a well-designed test graded by someone already in this set.”
If we assume that the questions are designed such that a student can answer them upon initial exposure if and only if they deeply understand the material, then the question of identifying graders turns into the much easier question of identifying people who can discriminate between valid and invalid answers. I’m told that being able to discriminate between valid and invalid responses is a necessary condition for subject expertise, so anyone who’s a relevant expert works. One way to demonstrate expertise is by building something that requires expertise. In an extreme example, I’m confident that Grigori Perelman understands topology because he proved the Poincare conjecture, and, for similar reasons, I’m (mostly) confident that Ph.Ds are experts. If we have well-designed tests, we can set the set of people qualified to grade tests as “has built something requiring expertise or has passed a well-designed test graded by someone already in this set.”