Comments on Power Law Distribution of Individual Impact
I had a discussion online yesterday, stemming from whether you should expect to be able to identify individuals who will most shape the long term future of humanity. It was on a discussion of whether CEA should have staff work on doing this full time, and I was expecting boring comments that just expressed a political opinion about what CEA should do. However, Jan Kulveit offered some concrete models for me to disagree with, and I had a fun exchange and appreciated the chance to make explicit some of my models in this area.
With permission of all involved, I have reproduced the exchange below.
Jan:
I would be also worried. Homophily is of the best predictors of links in social networks, and factors like being member of the same social group, having similar education, opinions, etc. are known to bias selection processes again toward selecting similar people. This risks having the core of the movement be more self encapsulated that it is, which is a shift in bad direction.
Also I would be worried with 80k hours shifting also more toward individual coaching, there is now a bit overemphasis on “individual” approach and too little on “creating systems”.
Also it seems lot of this would benefit from knowledge from the fields of “science of success”, general scientometry, network science, etc. E.g. when I read concepts like “next Peter Singer” or a lot of thinking along the line “most of the value is created by just a few peple”, I’m worried. While such thinking is intuitively appealing, it can be quite superficial. E.g., a toy model: Imagine a landscape with gold scattered in power-law sized deposits. And prospectors, walking randomly, and randomly discovering deposits of gold. What you observe is the value of gold collected by prospectors is also power-law distributed. But obviously the attempts to emulate “the best” or find the “next best” would be futile. It seems open question (worth studying) how much some specific knowledge landscape resembles this model, or how big part of the success is attributable to luck.
Ben (me):
That’s a nice toy model, thanks for being so clear :-)
But it’s definitely wrong. If you look at Bostrom on AI or Einstein on Relativity or Feynman on Quantum Mechanics, you don’t see people who are roughly as competent as their peers, just being lucky in which part of the research space was divvied up and given to them. You tend to see people with rare and useful thinking processes having multiple important insights about their field in succession—getting many thing right that their peers didn’t, not just one as your model would predict (if being right was random luck). Bostrom has looked into half a dozen sci-fi looking areas that others looked to figure out which were important, before concluding with xrisk and AI, and he looked into areas and asked questions that were on nobody’s radar. Feynman made breakthroughs in many different subfields, and his success looked like being very good at fundamentals like being concrete and noticing his confusion. I know less about Einstein, but as I understand it to get to Relativity required a long chain of reasoning that was unclear to his contemporaries. “How would I design the universe if I were god” was probably not a standard tool that was handed out to many physicists to try.
You may respond “sure, these people came up with lots of good ideas that their contemporaries wouldn’t have, but this was probably due to them using the right heuristics, which you can think of as having been handed out randomly in grad school to all the different researchers, so it still is random just on the level of cognitive processes”.
To this I’d say that, you’re right, looking at people’s general cognitive processes is really important, but I think I can do much better than random chance in predicting what cognitive processes will produce valuable insights. I’ll point to Superforecasters and Rationality: AI to Zombies as books with many insights into which cognitive processes are more likely to find novel and important truths than others.
In sum: I think the people who’ve had the most positive impact in history are power law distributed because of their rare and valuable cognitive processes, not just random luck, and that these can be learned from and that can guide my search for people who (in future) will have massive impact.
Jan:
Obviously the toy model is wrong in describing reality: it’s one end of the possible spectrum, where you have complete randomness. On the other you have another toy model: results in a field neatly ordered by cognitive difficulty, and the best person at a time picks all the available fruit. My actual claims roughly are
reality is somewhere in between
it is field-dependent
even in fields more toward the random end, there actually would be differences like different speeds of travel among prospectors
It is quite unclear to me where on this scale the relevant fields are.
I believe your conclusion, that the power law distribution is all due to the properties of the peoples cognitive processes, and no to the randomness of the field, is not supported by the scientometric data for many research fields.
Thanks for a good preemptive answer :) Yes if you are good enough in identifying the “golden” cognitive processes. While it is clear you would be better than random chance, it is very unclear to me how good you would be. *
I think its worth digging into an example in detail: if you look a at early Einstein, you actually see someone with an unusually developed geometric thinking and the very lucky heuristic of interpreting what the equations say as the actual reality. Famously special relativity transformations were written first by Poincare. “All” what needed to be done was to take it seriously. General relativity is a different story, but at that point Einstein was already famous and possibly one of the few brave enough to attack the problem.
Continuing with the same example, I would be extremely doubtful if Einstein would be picked by selection process similar to what CEA or 80k hours will be probably running, before he become famous. 2nd grade patent clerk? Unimpressive. Well connected? No. Unusual geometric imagination? I’m not aware of any LessWrong sequence which would lead to picking this as that important :) Lucky heuristic? Pure gold, in hindsight.
(*) At the end you can take this as an optimization problem depending how good your superior-cognitive-process selection ability is. Let’s have a practical example: You have 1000 applicants. If your selection ability is great enough, you should take 20 for individual support. But maybe its just good, and than you may get better expected utility if you are able to reach 100 potentially great people in workshops. Maybe you are much better than chance, but not really good… than, maybe you should create online course taking in 400 participants.
Ben (me):
Examples are totally worth digging into! Yeah, I actually find myself surprised and slightly confused by the situation with Einstein, and do make the active predictions that he had somestrong connections in physics (e.g. at some point had a really great physics teacher who’d done some research). In general I think Ramanujan-like stories of geniuses appearing from nowhere are not the typical example of great thinkers / people who significantly change the world. If I’m I right I should be able to tell such stories about the others, and in general I do think that great people tend to get networked together, and that the thinking patterns of the greatest people are noticed by other good people before they do their seminal work cf. Bell Labs (Shannon/Feynman/Turing etc), Paypal Mafia (Thiel/Musk/Hoffman/Nosek etc), SL4 (Hanson/Bostrom/Yudkowsky/Legg etc), and maybe the Republic of Letters during the enlightenment? But I do want to spend more time digging into some of those.
To approach from the other end, what heuristics might I use to find people who in the future will create massive amounts of value that others miss? One example heuristic that Y Combinator uses to determine who in advance is likely to find novel, deep mines of value that others have missed is whether the individuals regularly build things to fix problems in their life (e.g. Zuckerberg built lots of simple online tools to help his fellow students study while at college).
Some heuristics I use to tell whether I think people are good at figuring out what’s true, and make plans for it, include:
Does the person, in conversation, regularly take long silent pauses to organise their thoughts, find good analogies, analyse your argument, etc? Many people I talk to take silence as a significant cost, due to social awkwardness, and do not make the trade-off toward figuring out what’s true. I always trust the people more that I talk to who make these small trade-offs toward truth versus social cost
Does the person have a history of executing long-term plans that weren’t incentivised by their local environment? Did they decide a personal-project (not, like, getting a degree) was worth putting 2 years into, and then put 2 years into it?
When I ask about a non-standard belief they have, can they give me a straightforward model with a few variables and simple relations, that they use to understand the topic we’re discussing? In general, how transparent are their models to themselves, and are the models general simple and backed by lots of little pieces of concrete evidence?
Are they good at finding genuine insights in the thinking of people who they believe are totally wrong?
My general thought is that there isn’t actually a lot of optimisation process put into this, especially in areas that don’t have institutions built around them exactly. For example academia will probably notice you if you’re very skilled in one discipline and compete directly in it, but it’s very hard to be noticed if you’re interdisciplinary (e.g. Robin Hanson’s book sitting between neuroscience and economics) or if you’re not competing along even just one or two of the dimensions it optimises for (e.g. MIRI researchers don’t optimise for publishing basically at all, so when they make big breakthroughs in decision theory and logical induction it doesn’t get them much notice from standard academia). So even our best institutions at noticing great thinkers with genuine and valuable insights seem to fail at some of the examples that seem most important. I think there is lots of low hanging fruit I can pick up in terms of figuring out who thinks well and will be able to find and mine deep sources of value.
Edit: Removed Bostrom as an example at the end, because I can’t figure out whether his success in academia, while nonetheless going through something of a non-standard path, is evidence for or against academia’s ability to figure out whose cognitive processes are best at figuring out what’s surprising+true+useful. I have the sense that he had to push against the standard incentive gradients a lot, but I might just be false and Bostrom is one of academia’s success stories this generation. He doesn’t look like he just rose to the top of a well-defined field though, it looks like he kept having to pick which topics were important and then find some route to publishing on them, as opposed to the other way round.
Greg Lewis subsequently also responded to Jan’s comment:
I share your caution on the difficulty of ‘picking high impact people well’, besides the risk of over-fitting on anecdata we happen to latch on to, the past may simply prove underpowered for forward prediction: I’m not sure any system could reliably ‘pick up’ Einstein or Ramanujan, and I wonder how much ‘thinking tools’ etc. are just epiphenomena of IQ.
That said, fairly boring metrics are fairly predictive. People who do exceptionally well at school tend to do well at university, those who excel at university have a better chance of exceptional professional success, and so on and so forth. SPARC (a program aimed at extraordinarily mathematically able youth) seems a neat example. I accept none of these supply an easy model for ‘talent scouting’ intra-EA, but they suggest one can do much better than chance.
Optimal selectivity also depends on the size of boost you give to people, even if they are imperfectly selected. It’s plausible this relationship could be convex over the ‘one-to-one mentoring to webpage’ range, and so you might have to gamble on something intensive even in expectation of you failing to identify most or nearly all of the potentially great people.
(Aside: Although tricky to put human ability on a cardinal scale, normal-distribution properties for things like working memory suggest cognitive ability (however cashed out) isn’t power law distributed. One explanation of how this could drive power-law distributions in some fields would be a Matthew effect: being marginally better than competing scientists lets one take the majority of the great new discoveries. This may suggest more neglected areas, or those where the crucial consideration is whether/when something is discovered, rather than who discovers it (compare a malaria vaccine to an AGI), are those where the premium to really exceptional talent is less. )
Jan’s last response to me:
For scientific publishing, I looked into the latest available paper[1] and apparently the data are best fitted by a model where the impact of scientific papers is predicted by Q.p, where p is “intrinsic value” of the project and Q is a parameter capturing the cognitive ability of the researcher. Notably, Q is independent of the total number of papers written by the scientist, and Q and p are also independent. Translating into the language of digging for gold, the prospectors differ in their speed and ability to extract gold from the deposits (Q). The gold in the deposits actually is randomly distributed. To extract exceptional value, you have to have both high Q and be very lucky. What is encouraging in selecting the talent is the Q seems relatively stable in the career and can be usefully estimated after ~20 publications. I would guess you can predict even with less data, but the correct “formula” would be trying to disentangle interestingness of the problems the person is working on from the interestingness of the results.
(As a side note, I was wrong in guessing this is strongly field-dependent, as the model seems stable across several disciplines, time periods, and many other parameters.)
Interesting heuristics about people :)
I agree the problem is somewhat different in areas not that established/institutionalized where you don’t have clear dimensions of competition, or the well measurable dimensions are not that well aligned with what is important. Loooks like another understudied area.
[1] Quantifying the evolution of individual scientific impact, Sinatra et.al. Science, http://www.sciencesuccess.org/uploads/1/5/5/4/15543620/science_quantifying_aaf5239_sinatra.pd
- 26 Jun 2020 13:41 UTC; 41 points) 's comment on Max_Daniel’s Quick takes by (EA Forum;
- 31 Dec 2017 23:50 UTC; 10 points) 's comment on Goodhart Taxonomy by (
- 10 Apr 2021 16:09 UTC; 6 points) 's comment on How much does performance differ between people? by (EA Forum;
- 26 Jun 2020 12:46 UTC; 5 points) 's comment on Max_Daniel’s Quick takes by (EA Forum;
- 31 Dec 2017 0:26 UTC; 0 points) 's comment on Centre for Effective Altruism (CEA): an overview of 2017 and our 2018 plans by (EA Forum;
Further thoughts, after discussion with Oli Habryka, on a model of an individual’s expected future impact:
1st order factor
Past experience of building substantial and valuable products
If someone has already done lots of the thing you’re measuring, then this is the best evidence for future success at it too
2nd order factor
IQ
This is super powerful due to the positive manifold in psychometrics, where all variables of competence correlate positively.
However, my current community which is selected fairly strongly on this—all like 2 s.d’s above average, STEM students, etc, and because the tails come apart [EDIT: also known as regressional goodheart], this only captures like 25% of the variance rather than the global ~70%. So it’s not vastly more important than some of the 3rd order factors.
3rd order factors
Conscientiousness (on Big Five)
Contrarian-ness
i.e. ability to not follow local incentives toward social conformity
4th order factor
Openness (on Big Five)
The two 3rd-order factors are interesting because they seem to anti-correlate. Conscientiousness often looks like ‘do you follow orders’ and contrarian-ness… looks like the opposite. But getting both is awesome—it’s the standard Thiel-recommendation of finding someone who is great at seemingly contradictory things.
Here are the four heuristics I mentioned in the post, and which factors they measure:
Does the person have long (>1 minute) silent pauses for thinking in their conversations?
3rd order factors: Contrarian-ness, and to a lesser extent, conscientiousness
Have they exectued long-term plans not incentivised by local environment?
1st order: Past experience of building substantial and valuable products
3rd order: contrarian-ness and conscientiousness
Contrarian beliefs form simple, communicable, predictive models with a few moving parts
2nd order: IQ
3rd order: contrarian-ness
Finding insights in those they disagree with
2nd order: IQ
4th order: Openness
Have you considered/do you know more about RQ?
“Professor Stanovich and colleagues had large samples of subjects (usually several hundred) complete judgment tests like the Linda problem, as well as an I.Q. test. The major finding was that irrationality — or what Professor Stanovich called “dysrationalia” — correlates relatively weakly with I.Q.
[...]
Based on this evidence, Professor Stanovich and colleagues have introduced the concept of the rationality quotient, or R.Q. If an I.Q. test measures something like raw intellectual horsepower (abstract reasoning and verbal ability), a test of R.Q. would measure the propensity for reflective thought — stepping back from your own thinking and correcting its faulty tendencies.
There is also now evidence that rationality, unlike intelligence, can be improved through training. [...]”
https://www.nytimes.com/2016/09/18/opinion/sunday/the-difference-between-rationality-and-intelligence.html
Actually, I think this claim is wrong:
RQ is predicted pretty well by IQ, correlating at 0.695 (according to Stuart Richie’s book review that I read), and it seems plausible that the rest of the variance is noise. IQ correlates positively with all important factors, and often heavily (google the ‘positive manifold’ for more info), which is why I put it so high on my list.
I conjecture that the reason why Stanovich’s research isn’t very useful, is that he tried to find some factor that was as broadly applicable to the population as IQ is. However, his assumption that IQ is missing something massive was just wrong, and so he just ended up with another measure of IQ. What would’ve been more useful would’ve been to try to find some factor that predicts success after conditioning on IQ—for example, Tetlock’s work is about figuring out how the very best people think differently than everyone else, and so his work comes out with great insights about forecasting, bayesianism and model-building.
Added: I used to be a big fan of Stanovich’s work, but when I discovered that RQ correlated with IQ at 0.7… well, that’s what caused me to realise that in fact IQ is a super great predictor of important cognitive properties. And then I read the history of IQ research, which is essentially people trying to prove as hard as they can that there are important metrics of success that don’t correlate with IQ, and then failing to do so.
Hm this is an update… I’ll have to think more about it. (The “added” section actually provided most of the force (~75%) behind my update. It’s great that you provided causal reasons for your beliefs.)
I appreciate the feedback! Very useful to know that sort of thing.
Almost all scales in psychometrics are normalized, and the ones that are not normalized usually show very lopsided distributions. An interesting illustration here is the original Stanford-Binet IQ test scale, which just gave children a set of questions, and then divided the resulting score by the average for children of that age (and then multiplied it by 100), and which had very wide distributions with the 90th percentile of scores or so being a factor of 15 apart.
I don’t know which working memory scale Greg is referring to here, but I would be quite surprised if that scale isn’t manually normalized, and would expect various forms of working memory measures to vary drastically between different people. As an example, the digit span distribution in this paper is clearly log-normally distributed (or some similar distribution), but definitely not normally distributed:
https://www.researchgate.net/figure/Figure-Distribution-of-digit-numbers-in-the-backward-digit-span-test_7664779
I’m aware of normalisation, hence I chose things which have some sort of ‘natural cardinal scale’ (i.e. ‘how many Raven’s do you get right’ doesn’t really work, but ‘how many things can you keep in mind at once’ is better, albeit imperfect).
Not all skew entails a log-normal (or some similar—assumedly heavy tailed) distribution. This applies to your graph for digit span you cite here. The mean of the data is around 5, and the SD is around 2. Having ~11% at +1SD (7) and about 3% at +2SD (9) is a lot closer to normal distribution land (or, given this is count data, a pretty well-behaved poisson/slightly overdispersed binomial) than a hypothetical log normal. Given log normality, one should expect a dramatically higher maximum score when you increase the sample size from 78 in the cited study to 2400 or so. Yet in the standardization sample of the WAIS III of this size no individual had greater than 9 in forward digit span (and no one higher than 8 in reverse). (This is, I assume, the foundation for the famous ‘7 plus or minus 2’ claim.)
http://www.sciencedirect.com/science/article/pii/S0887617701001767#TBL2
A lot turns on ‘vary dramatically’, but I think on most commonsense uses of this would not be it. I’d take reaction time data to be similar—although there is a ‘long tail’, this is a long tail of worse performance—and the tail isn’t that long. So I don’t buy claims I occasionally see made along the lines of ‘Einstein was just miles smarter than a merely average physicist’.
Huh, I notice that I am confused about noone in the sample having a larger digit span than 9. Do we know whether they didn’t just stop measuring after 9?
This random blogpost suggests that they stop at 9: https://pumpkinperson.com/2015/11/19/the-iq-of-daniel-seligman-part-5-digit-span-subtest/
I was unaware of the range restriction, which could well compress SD. That said, if you take the ‘9’ scorers as ‘9 or more’, then you get something like this (using 20-25)
Mean value is around 7 (6.8), 7% get 9 or more, suggesting 9 is at or around +1.5SD assuming normality, so when you get a sample size in the thousands, you should start seeing scores at 11 or so (+3SD) - I wouldn’t be startled to find Ben has this level of ability. But scores at (say) 15 or higher (+6SD) should only be seen with extraordinarily rarely.
If you use log-normal assumptions, you should expect something like if +1.5SD is 2, 3SD is around 6 (i.e. ~13), and 4.5SD would give scores at 21 or so.
An unfortunate challenge at picking at the tails here is one can train digit span—memory athletes drill this and I understand the record lies in the three figures.
Perhaps a natural test would be getting very smart but training naive people (IMOers?) to try this. If they’re consistently scoring 15+, this is hard to reconcile with normalish assumptions (digit span wouldn’t correlate perfectly with mathematical ability, so lots of 6 sigma+ results look weird), and vice versa.
Quick sanity check:
4.5SD = roughly 1 in 300,000 (according to wikipedia)
UK population = roughly 50 million
So there’d be 50 * 3 = 150 people in the UK who should be able to get scores at ~21 or more. Which seems quite plausible to me.
Also I know a few IMO people, I bet we could test this.
I would be happy to take a bet that took a random sample of people that we knew (let’s say 10) and saw whether their responses fit more with a log-normal or a normal distribution, though I do guess this would be quite indiscriminate, since we are looking for divergence in the tails.
I would take a bet that if there were a hypothetical dataset that would extend further, that the maximum among 2400 participants would at least be 12.
Oli just gave me the test as described on wikipedia, and I got all the way up to 11. According to Greg’s world model, I’m in at least the 0.05th percentile (better than 2,400 random students), but given a normal distribution that expects 0 at 10 with a sample of 2,400, I must be way higher than that. (If anyone can do the maths, would be appreciated, I’d guess I’m like more than 1 in a million tho. According to Greg’s world-model.)
Added: Extra info, I started visualising the first 6 digits (in 2 groups of 3) and remembering the rest in my audio memory.
This new paper may be of relevance (H/T Steve Hsu). The abstract:
Huh, I am surprised that this got published. The model proposed seems almost completely equivalent to the O-ring paper that has a ton of literature on it, that had roughly the same results. And it doesn’t have any empirical backing, so that’s even more confusing. I mean, it‘s a decent illustration, but it does really seem to not be saying anything new in this space.
They also weirdly overstate their point. The correlation between luck and talent heavily depends on the number of iterations and initial distribution parameters their model assumes, and they seem to just have arbitrarily fixed them for their abstract, and later in the paper they basically say “if you change these parameters, the correlation of talent with success goes up drastically, and the resulting distribution still fits the data”. I.e. the only interesting thing that they’ve shown is that if you have repeated trials with probabilities drawn from a normal distribution, you get a heavy-tailed distribution, which is a trivial statistical fact addressed in hundreds of papers.
I am surprised that you are surprised that this got published. It reinforces and claims to provide proof towards the worldviews currently ascendant in academia, strengthening politically convenient claims and weakening inconvenient ones. Overstatement of the result also seems par for the course. That doesn’t make it useful, or anything, but it all seems very unsurprising.
Yeah, I was just thinking about me saying that while I was standing in the shower. I actually planned to remove the “I am surprised that this got published” line, because I wasn’t actually surprised. I think implicitly I probably just wanted to reduce the status of the associated paper, and question its legitimacy, and it seems that the cached phrase I currently have for that is “I am surprised this got published”, which really doesn’t seem like the ideal phrase for that, but does seem pretty commonly used for precisely that purpose.
Link to the O-ring paper?
Wikipedia: https://www.wikiwand.com/en/O-ringtheory of_economic_development
Original: https://www.jstor.org/stable/2118400
Really great discussion here, on an important and action-guiding question.
I’m confused about some of the discussion of predicting impact.
If we’re dealing with a power-law, then most of the variance in impact comes from a handful of samples. So if you’re using a metric like “contrarianness+conscientiuosness” that corresponds to an exceedingly rare trait, it might look like you’re predictions are awful, because thousands of CEOs and career executives who are succesful by common standards lack that trait. However, as long as you get Musk and a handful others right, you will have correctly predicted most of the impact, despite missing most of the succesful people. What matters is not how many data-points you get right, but which ones.
Similarly, were it the case that one or two tail-end individuals (like Warren Buffett) score within 2 standard deviations on IQ, that would make IQ a substantially worse metric for predicting who will have the most impact. I haven’t found any such individual, but I think doing so suffices to discredit some of the psychometric study conclusions as long as they didn’t include that particular individual (which they likely didn’t).
Well, only if the individual would falsify the study. My claim is that the folks at the end of the power law will have these properties. I think of it as a filtering mechanism: first you filter by the first order factors, then the second order, and so on, each one doing less work than the last (for example, filtering by >2 SD IQ will cut you to <5% of the population, but once you’re just down to the best 0.01% then the third order factors will help you pick out the peak, even though those factors wouldn’t have cut down the world very much to start with).