One hypothesis for why current hiring practices seem not-very-good: there’s usually no feedback mechanism. There are sometimes obvious cases, where a hire ended up being really good or really bad, but there’s no fine-grained way to measure how someone is doing—let alone how much value they add to the organization.
Any prediction market proposal to fix hiring first needs to solve that problem. You need a metric for performance, so you have a ground truth to use for determining bet pay-offs. And to work in practice, that metric also needs to get around Goodhart’s Law somehow. (See here for a mathy explanation of roughly this problem.)
Now for the flip side: if we had an accurate, Goodhart-proof metric for employee performance, then we probably wouldn’t need a fancy prediction market to utilize it. Don’t get me wrong, a prediction market would be a very fast and efficient way to incorporate all the relevant info. But even a traditional HR department can probably figure out what they need to do in order to improve their metric, once they have a metric to improve.
if we had an accurate, Goodhart-proof metric for employee performance
At least for programmers, above some low threshold of skill, performance is very situational. Put a good programmer in a soul-sucking dead end role, and you’ll see poor performance; give a mediocre programmer a chance to do interesting and impactful work, and you’ll see good performance. Measuring performance is also demotivating, as Deming explained:
The idea of merit rating is alluring. The sound of the words captivates the imagination: pay for what you get; get what you pay for; motivate people to do their best, for their own good. The effect is exactly the opposite of what the words promise. Everyone propels himself forward, or tries to, for his own good, on his own life preserver. The organization is the loser.
So at least from my corner, I think companies should avoid measuring their employees. Whatever problem you’re trying to solve with that, there’s probably a systemic fix that would work better.
I want to point out one difficult-to-metricize feature of performance that this interacts with, which is how people work with the team. Lacking a prediction market or similar, there isn’t a mechanism for having the team weigh in on the subject beforehand, so there isn’t even anything to loop back to; I doubt it is possible to design a metric for this that you could look at beforehand. For example, dating services and websites have worked very hard to try and figure metrics for relationships out, with limited success. How much greater is the challenge when there are multiple people, who are going to be under performance pressure?
Still, in the beginning it seems like using some common-sense measures should be plausible: not terminated for cause; team output goes up; doesn’t leave for another local job in a year or less. All these would differ by industry, but hiring in general seems like a good match for the reasoned rule approach.
GreyThumb.blog offered an interesting comparison of poor animal breeding practices and the fall of Enron, which I previously posted on in some detail. The essential theme was that individual selection on chickens for the chicken in each generation who laid the most eggs, produced highly competitive chickens—the most dominant chickens that pecked their way to the top of the pecking order at the expense of other chickens. The chickens subjected to this individual selection for egg-laying prowess needed their beaks clipped, or housing in individual cages, or they would peck each other to death.
Which is to say: individual selection is selecting on the wrong criterion, because what the farmer actually wants is high egg production from groups of chickens.
An institution’s performance is the sum of its groups more directly than it is the sum of its individuals—though of course there are interactions between groups as well. Find people who, in general, seem to have a statistical tendency to belong to high-performing groups—these are the ones who contribute much to the group, who are persuasive with good arguments.
I’ve been thinking about the statistical tendency to belong to high-performing groups, and I keep running up against a dearth of data problem. Information about the performance of top-level organizations is easy to come by, and individuals make efforts to communicate information about their own performance, but teams which are the unit of action don’t seem very legible to third parties.
I wonder how much of a motive this is for acquiring startups instead of driving innovation internally; a startup often is only the unit of action, and has gone through several rounds of assessment by investors.
One hypothesis for why current hiring practices seem not-very-good: there’s usually no feedback mechanism. There are sometimes obvious cases, where a hire ended up being really good or really bad, but there’s no fine-grained way to measure how someone is doing—let alone how much value they add to the organization.
Any prediction market proposal to fix hiring first needs to solve that problem. You need a metric for performance, so you have a ground truth to use for determining bet pay-offs. And to work in practice, that metric also needs to get around Goodhart’s Law somehow. (See here for a mathy explanation of roughly this problem.)
Now for the flip side: if we had an accurate, Goodhart-proof metric for employee performance, then we probably wouldn’t need a fancy prediction market to utilize it. Don’t get me wrong, a prediction market would be a very fast and efficient way to incorporate all the relevant info. But even a traditional HR department can probably figure out what they need to do in order to improve their metric, once they have a metric to improve.
At least for programmers, above some low threshold of skill, performance is very situational. Put a good programmer in a soul-sucking dead end role, and you’ll see poor performance; give a mediocre programmer a chance to do interesting and impactful work, and you’ll see good performance. Measuring performance is also demotivating, as Deming explained:
So at least from my corner, I think companies should avoid measuring their employees. Whatever problem you’re trying to solve with that, there’s probably a systemic fix that would work better.
For better or worse companies like Google or Amazon do have internal metrics for hiring.
Problems come when as in Amazon the available data about hiring gets fed to a neural net and the neural net starts to discriminate in an illegal way.
This is a good point.
I want to point out one difficult-to-metricize feature of performance that this interacts with, which is how people work with the team. Lacking a prediction market or similar, there isn’t a mechanism for having the team weigh in on the subject beforehand, so there isn’t even anything to loop back to; I doubt it is possible to design a metric for this that you could look at beforehand. For example, dating services and websites have worked very hard to try and figure metrics for relationships out, with limited success. How much greater is the challenge when there are multiple people, who are going to be under performance pressure?
Still, in the beginning it seems like using some common-sense measures should be plausible: not terminated for cause; team output goes up; doesn’t leave for another local job in a year or less. All these would differ by industry, but hiring in general seems like a good match for the reasoned rule approach.
An old post of Eliezer’s, “Selecting Rationalist Groups”, seems highly relevant. Some quotes:
I’ve been thinking about the statistical tendency to belong to high-performing groups, and I keep running up against a dearth of data problem. Information about the performance of top-level organizations is easy to come by, and individuals make efforts to communicate information about their own performance, but teams which are the unit of action don’t seem very legible to third parties.
I wonder how much of a motive this is for acquiring startups instead of driving innovation internally; a startup often is only the unit of action, and has gone through several rounds of assessment by investors.