It’s AI-based, so my guess is that it uses a lot of somewhat superficial correlates that could be gamed. I expect that if it went mainstream it would be Goodharted.
I expect Goodhart would hit particularly bad if you were doing the kind of usage I guess you are implying, which is searching for a few very well selected people. A selective search is a strong optimization, and so Goodharts more.
More concrete example I have in mind, that maybe applies right now to the technology: there are people who are good at lying to themselves.
That’s not really the kind of usage I was thinking of; I was thinking of screening out low-honesty candidates from a pool who already qualified to join a high-trust system (which currently do not exist for any high-stakes matter). Large amounts of sensor (particularly from people lying and telling the truth during different kinds of interviews) will probably be necessary, but will need to focus on specific indicators of lying e.g. discomfort or heart rate changes or activity in certain parts of the brain, and extremely low false positive and false negative rages probably won’t be feasible.
Also, hopefully people would naturally set up multiple different tests for redundancy, each of which would have to be goodharted separately, and each false positive (case of a uniquely bad person being revealed as bad after passing the screening) would be added to the training data. Periodically re-testing people for the concealed emergence of low-trust tendencies would further facilitate this. Sadly, whenever a person slips through the cracks and lies and discovers they got away with it, they will know that they got away with it and continue doing it.
It’s AI-based, so my guess is that it uses a lot of somewhat superficial correlates that could be gamed. I expect that if it went mainstream it would be Goodharted.
I expect Goodhart would hit particularly bad if you were doing the kind of usage I guess you are implying, which is searching for a few very well selected people. A selective search is a strong optimization, and so Goodharts more.
More concrete example I have in mind, that maybe applies right now to the technology: there are people who are good at lying to themselves.
That’s not really the kind of usage I was thinking of; I was thinking of screening out low-honesty candidates from a pool who already qualified to join a high-trust system (which currently do not exist for any high-stakes matter). Large amounts of sensor (particularly from people lying and telling the truth during different kinds of interviews) will probably be necessary, but will need to focus on specific indicators of lying e.g. discomfort or heart rate changes or activity in certain parts of the brain, and extremely low false positive and false negative rages probably won’t be feasible.
Also, hopefully people would naturally set up multiple different tests for redundancy, each of which would have to be goodharted separately, and each false positive (case of a uniquely bad person being revealed as bad after passing the screening) would be added to the training data. Periodically re-testing people for the concealed emergence of low-trust tendencies would further facilitate this. Sadly, whenever a person slips through the cracks and lies and discovers they got away with it, they will know that they got away with it and continue doing it.