Same way the driver’s-ed test or the citizenship test given to immigrants manage it? Or perhaps you think they don’t … I find it unlikely this design problem should be simply dismissed as unsolvable but it certainly needs to be borne in mind … point, I guess.
The driver’s-ed test and to a certain extent the citizenship test have different incentives then a voting test. In particular with a voting test the incentive is to turn it into a test of whether the person agrees with the test writers’ political beliefs.
I have to admit, I’m just assuming you would arrange better incentives for the designers. Say, have independent reviews and connect them to salary, or only recruit those with a strong desire for neutrality (and give them access to domain experts). Then again, I have no idea if the incentives actually align for the creators of other tests … everyone is crazy and the world is mad, etc, etc.
I have to admit, I’m just assuming you would arrange better incentives for the designers.
You seem to be massively underestimating how hard this is. You can’t simply wave this problem away by invoking words like “independent”, “neutrality”, and “domain expert” as if they’re some kind of magic spell.
… I wasn’t. I was sketching out, off the top of my head, the basic precautions I would take on attempting something like this. You seem to be estimating the difficulty—the impossibility—on the basis of a model where you take no precautions whatsoever.
And how exactly to you propose doing testing in a way that doesn’t run into the problems with Goodhart’s law I mentioned here?
Same way the driver’s-ed test or the citizenship test given to immigrants manage it? Or perhaps you think they don’t … I find it unlikely this design problem should be simply dismissed as unsolvable but it certainly needs to be borne in mind … point, I guess.
The driver’s-ed test and to a certain extent the citizenship test have different incentives then a voting test. In particular with a voting test the incentive is to turn it into a test of whether the person agrees with the test writers’ political beliefs.
I have to admit, I’m just assuming you would arrange better incentives for the designers. Say, have independent reviews and connect them to salary, or only recruit those with a strong desire for neutrality (and give them access to domain experts). Then again, I have no idea if the incentives actually align for the creators of other tests … everyone is crazy and the world is mad, etc, etc.
You seem to be massively underestimating how hard this is. You can’t simply wave this problem away by invoking words like “independent”, “neutrality”, and “domain expert” as if they’re some kind of magic spell.
… I wasn’t. I was sketching out, off the top of my head, the basic precautions I would take on attempting something like this. You seem to be estimating the difficulty—the impossibility—on the basis of a model where you take no precautions whatsoever.