There are several extra features to consider. Firstly, even if you only test, that doesn’t mean the skills weren’t trained. Suppose there are lots of smart kids that really want to be astronauts. And that Nasa puts its selection criteria somewhere easily available. The kids then study the skills they think they need to pass the selection. Any time there is any reason to think that skills X,Y and Z are good combinations there will be more people with these skills then chance predicts.
There is also the dark side, goodharts curse. It is hard to select over a large number of people without selecting for lying sociopaths that are gaming your selection criteria.
Great comment—these were both things I thought about putting in the post, but didn’t quite fit.
Goodhart, in particular, is a huge reason to avoid relying on many bits of selection, even aside from the exponential problem. Of course we also have to be careful of Goodhart when designing training programs, but at least there we have more elbow room to iterate and examine the results, and less incentive for the trainees to hack the process.
There are several extra features to consider. Firstly, even if you only test, that doesn’t mean the skills weren’t trained. Suppose there are lots of smart kids that really want to be astronauts. And that Nasa puts its selection criteria somewhere easily available. The kids then study the skills they think they need to pass the selection. Any time there is any reason to think that skills X,Y and Z are good combinations there will be more people with these skills then chance predicts.
There is also the dark side, goodharts curse. It is hard to select over a large number of people without selecting for lying sociopaths that are gaming your selection criteria.
Great comment—these were both things I thought about putting in the post, but didn’t quite fit.
Goodhart, in particular, is a huge reason to avoid relying on many bits of selection, even aside from the exponential problem. Of course we also have to be careful of Goodhart when designing training programs, but at least there we have more elbow room to iterate and examine the results, and less incentive for the trainees to hack the process.