In a recent episode of The Filan
Cabinet,
Oliver Habryka elaborated on a very interesting social pattern: If you
have a community with high status people, and try to introduce clearer
metrics of performance into that community, high status individuals in
the community will strongly resist those metrics because they have an
asymmetric downside: If they perform well on the metric, they stay in
their position, but if they perform poorly, they might lose status. Since
they are at least a little bit unsure about their performance on the
metric relative to others, they can only lose.
Daniel Filan: So let’s go back to what you think on your bad days. So
you mentioned that you had this sense that lots of things in the world
were, I don’t know, trying to distract you from things that are true or
important. And that LessWrong did that somewhat less.
Oliver Habryka: Yeah.
Daniel Filan: Can you kind of flesh that out? What kinds of things
are you thinking of?
Oliver Habryka: I mean, the central dimension that I would often
think about here is reputation management. As an example, the medical
profession, which, you know, generally has the primary job of helping
you with your medical problems and trying to heal you of diseases and
various other things, also, at the same time, seems to have a very strong
norm of mutual reputation protection. Where, if you try to run a study
trying to figure out which doctors in the hospital are better or worse
than other doctors in the hospital, quite quickly, the hospital will
close its ranks and be like, “Sorry, we cannot gather data on [which
doctors are better than the other doctors in this hospital].” Because
that would, like, threaten the reputation arrangement we have. This would
introduce additional data that might cause some of us to be judged and
some others of us to not be judged.
And my sense is the way that usually looks like from the inside is
an actual intentional blinding to performance metrics in order to both
maintain a sense of social peace, and often the case because… A very
common pattern here [is] something like, you have a status hierarchy
within a community or a local institution like a hospital. And generally,
that status hierarchy, because of the way it works, has leadership
of the status hierarchy be opposed to all changes to the status
hierarchy. Because the current leadership is at the top of the status
hierarchy, and so almost anything that we introduce into the system that
involves changes to that hierarchy is a threat, and there isn’t much to
be gained, [at least in] the zero-sum status conflict that is present.
And so my sense is, when you try to run these studies about comparative
doctor performance, what happens is more that there’s an existing status
hierarchy, and lots of people feel a sense of uneasiness and a sense
of wanting to protect the status quo, and therefore they push back on
gathering relevant data here. And from the inside this often looks like
an aversion to trying to understand what are actually the things that
cause different doctors to be better than other doctors. Which is crazy,
if you’re, like, what is the primary job of a good medical institution
and a good medical profession, it would be figuring out what makes people
be better doctors and worse doctors. But [there are] all of the social
dynamics that tend to be present in lots of different institutions that
make it so that looking at relative performance [metrics] becomes a
quite taboo topic and a topic that is quite scary.
So that’s one way [in which] I think many places try to actively… Many
groups of people, when they try to orient and gather around a certain
purpose, actually [have a harder time] or get blinded or in some sense
get integrated into a hierarchy that makes it harder for them to look
at a thing that they were originally interested in when joining the
institution.
This dynamic appears to me to explain many dysfunctions that currently
occur, and important enough to collect some examples where this pattern
occurred/occurs and where it was broken.
Examples of the Dynamic
Ignaz Semmelweis and Hospitals
Ignaz Semmelweis
studied the maternal mortality rate due to puerperal fever in two
different hospitals in the Vienna of the mid 19th century. Finding a
stark difference in mortality rates between the two clinics (10% and
4%), he pursued different theories, finally concluding that the medical
students and doctors in the first clinic did not wash their hands even
after autopsies, while the midwives in the second clinic did not have
contact with corpses. He instituted a policy of first desinfecting hands
with a chlorinated solution and later simple handwashing, leading to
drastic declines in the mortality rate.
However, Semmelweis was derided for his ideas, and they were dismissed
as out of line with the then common four humors theory. Additionally, Wikipedia states that
As a result, his ideas were rejected by the medical community. Other,
more subtle, factors may also have played a role. Some doctors, for
instance, were offended at the suggestion that they should wash their
hands, feeling that their social status as gentlemen was inconsistent with
the idea that their hands could be unclean.
And:
János Diescher was appointed Semmelweis’s successor at the Pest
University maternity clinic. Immediately, mortality rates increased
sixfold to 6%, but the physicians of Budapest said nothing; there were
no inquiries and no protests.
It is quite surprising to me that we still know these numbers.
Intellectuals Resist IQ
Another example is that (arguendo) many intellectuals
place an absurdly high amount of rigor on any attempts to
develop tests for g, and the widespread isolated demands for
rigor
placed on IQ tests, despite their predictive predictive
value.
The hypothesis that many intellectuals fear that widespread reliable
testing of cognitive ability could hurt their position in status
hierarchies where intellectual performance is crucial retrodicts this.
FTFP and Two-Party Systems
A more esoteric example is the baffling prevalence of
first-past-the-post
voting systems, despite the large amount of wasted
votes
it incurs, the pressure towards two-party systems via
Duverger’s law
(and probably thereby creating more polarization),
and the large number of voting method criteria it
violates.
Here, again, the groups with large amounts of power have an incentive to
prevent better methods of measurement coming into place: in a two-party
system, both parties don’t want additional competitors (i.e. third
parties) to have a better shot at coming into power, which would be the
case with better voting methods—so both parties don’t undertake any
steps to switch to a better system (and might even attempt to hinder or
stop such efforts).
If this is true, then it’s a display that high status actors not only
resist quantification of performance, but also improvements in the
existing metrics for performance.
One reason for this is the at best shaky legal
ground
such markets stand on in the US, but they also threaten the
reputations of pundits and other information sources who often do
not display high accuracy in their predictions. See also Hanson
2023
A partner at a prominent US-based global management consultant:
“The objective truth should never be more than optional input to
any structural narrative in a social system.”
Software Development Effort Estimation
(epistemic status: The information in this section is from personal
communication with a person with ~35 years of industry experience in
software, but I haven’t checked the claims in detail yet.)
It is common for big projects to take more effort and time to complete
than initially estimated. For software projects, there exist several
companies who act as consultants to estimate the effort and time
required for completing software projects. However, none of those
companies publish track records of their forecasting ability (in
line with the surprising general lack of customers demanding track
records
of performance).
This is a somewhat noncentral case, because the development effort
estimation companies and consultancies are probably not high status
relative to their customers, but perhaps the most well-regarded of such
companies have found a way to coordinate around making track records
low status.
The Rationality Community
Left as an exercise to the reader.
Transitions to Clearer Quantification
If one accepts that this dynamic is surprisingly common, one might be
interested in how to avoid it (or transition to a regime with stronger
quantification for performance).
One such example could be the emergence of Mixed Martial
Arts through the
Gracie challenge
in the 1920s and later the
UFC:
The ability to hold fights in relatively short tournaments with clear win
conditions seems to have enabled the establishment of a parallel status
hierarchy, which outperformed other systems whenever the two came in
contact. I have not dug into the details of this, but this
could be an interesting case study for bringing more meritocracy into
existing sklerotic status hierarchies.
High Status Eschews Quantification of Performance
In a recent episode of The Filan Cabinet, Oliver Habryka elaborated on a very interesting social pattern: If you have a community with high status people, and try to introduce clearer metrics of performance into that community, high status individuals in the community will strongly resist those metrics because they have an asymmetric downside: If they perform well on the metric, they stay in their position, but if they perform poorly, they might lose status. Since they are at least a little bit unsure about their performance on the metric relative to others, they can only lose.
— Oliver Habryka & Daniel Filan, “The Filan Cabinet Podcast with Oliver Habryka—Transcript”, 2023
This dynamic appears to me to explain many dysfunctions that currently occur, and important enough to collect some examples where this pattern occurred/occurs and where it was broken.
Examples of the Dynamic
Ignaz Semmelweis and Hospitals
Ignaz Semmelweis studied the maternal mortality rate due to puerperal fever in two different hospitals in the Vienna of the mid 19th century. Finding a stark difference in mortality rates between the two clinics (10% and 4%), he pursued different theories, finally concluding that the medical students and doctors in the first clinic did not wash their hands even after autopsies, while the midwives in the second clinic did not have contact with corpses. He instituted a policy of first desinfecting hands with a chlorinated solution and later simple handwashing, leading to drastic declines in the mortality rate.
However, Semmelweis was derided for his ideas, and they were dismissed as out of line with the then common four humors theory. Additionally, Wikipedia states that
And:
It is quite surprising to me that we still know these numbers.
Intellectuals Resist IQ
Another example is that (arguendo) many intellectuals place an absurdly high amount of rigor on any attempts to develop tests for g, and the widespread isolated demands for rigor placed on IQ tests, despite their predictive predictive value.
The hypothesis that many intellectuals fear that widespread reliable testing of cognitive ability could hurt their position in status hierarchies where intellectual performance is crucial retrodicts this.
FTFP and Two-Party Systems
A more esoteric example is the baffling prevalence of first-past-the-post voting systems, despite the large amount of wasted votes it incurs, the pressure towards two-party systems via Duverger’s law (and probably thereby creating more polarization), and the large number of voting method criteria it violates.
Here, again, the groups with large amounts of power have an incentive to prevent better methods of measurement coming into place: in a two-party system, both parties don’t want additional competitors (i.e. third parties) to have a better shot at coming into power, which would be the case with better voting methods—so both parties don’t undertake any steps to switch to a better system (and might even attempt to hinder or stop such efforts).
If this is true, then it’s a display that high status actors not only resist quantification of performance, but also improvements in the existing metrics for performance.
Prediction Markets
Prediction markets offer to be a reliable mechanism for aggregating information about future events, yet they have been adopted in neither governments nor companies.
One reason for this is the at best shaky legal ground such markets stand on in the US, but they also threaten the reputations of pundits and other information sources who often do not display high accuracy in their predictions. See also Hanson 2023
Software Development Effort Estimation
(epistemic status: The information in this section is from personal communication with a person with ~35 years of industry experience in software, but I haven’t checked the claims in detail yet.)
It is common for big projects to take more effort and time to complete than initially estimated. For software projects, there exist several companies who act as consultants to estimate the effort and time required for completing software projects. However, none of those companies publish track records of their forecasting ability (in line with the surprising general lack of customers demanding track records of performance).
This is a somewhat noncentral case, because the development effort estimation companies and consultancies are probably not high status relative to their customers, but perhaps the most well-regarded of such companies have found a way to coordinate around making track records low status.
The Rationality Community
Left as an exercise to the reader.
Transitions to Clearer Quantification
If one accepts that this dynamic is surprisingly common, one might be interested in how to avoid it (or transition to a regime with stronger quantification for performance).
One such example could be the emergence of Mixed Martial Arts through the Gracie challenge in the 1920s and later the UFC: The ability to hold fights in relatively short tournaments with clear win conditions seems to have enabled the establishment of a parallel status hierarchy, which outperformed other systems whenever the two came in contact. I have not dug into the details of this, but this could be an interesting case study for bringing more meritocracy into existing sklerotic status hierarchies.