You are probably not a good alignment researcher, and other blatant lies

When people talk about research ability, a common meme I keep hearing goes something like this:

  • Someone who would become a great alignment researcher will probably not be stopped by Confusions about a Thorny Technical Problem X that’s Only Obvious to Someone Who Did the Right PhD.

  • Someone who would become a great alignment researcher will probably not have Extremely Hairy But Also Extremely Common Productivity Issue Y.

  • Someone who would become...great...would probably not have Insecurity Z that Everyone in the Audience Secretly Has.

What is the point of telling anyone any of these? If I were being particularly uncharitable, I’d guess the most obvious explanation that it’s some kind of barely-acceptable status play, kind of like the budget version of saying “Are you smarter than Paul Christiano? I didn’t think so.” Or maybe I’m feeling a bit more generous today so I’ll think that it’s Wittgenstein’s Ruler, a convoluted call for help pointing out the insecurities that the said person cannot admit to themselves.

But this is LessWrong and it’s not customary to be so suspicious of people’s motivations, so let’s assume that it’s just an honest and pithy way of communicating the boundaries of hard-to-articulate internal models.

First of all, what model?

Most people here believe some form of biodeterminism. That we are not born tabula rasa, that our genes influence the way we are, that the conditions in our mother’s womb can and do often snowball into observable differences when we grow up.

But the thing is, these facts do not constitute a useful causal model of reality. IQ, aka (a proxy for) the most important psychometric construct ever discovered and most often the single biggest predictor of outcomes in a vast number of human endeavours, is not a gears-level model.

Huh? Suppose it were, and it were the sole determinant of performance in any mentally taxing field. Take two mathematicians with the exact same IQ. Can you tell me who would go on to become a number theorist vs an algebraic topologist? Can you tell me who would over-rely on forcing when disentangling certain logic problems? Can you tell me why Terrence Tao hasn’t solved the Riemann hypothesis yet?

There is so much more that goes into becoming a successful scientist than can be distilled in a single number it’s not even funny. Even if said number means a 180 IQ scientist is more likely to win a Nobel than a 140 IQ nobody. Even if said number means it’s a safer bet to just skim the top 10 of whatever the hell the modern equivalent of the SMPY is than to take a chance on some rando on the other side of the world who is losing sleep on the problem.

But okay, sure. Maybe approximately no one says that it’s just IQ. Nobody on LessWrong is so naïve as to have a simple model with no caveats so: let’s say it’s not just IQ but some other combo of secret sauces. Maybe there’s like eight variables that together form a lognormal distribution.

Then the question becomes: how the hell is your human behaviour predicting machine so precise that you’re able to say with abject confidence what can exclude someone from doing important work? Do you actually have a set of values in mind for each of these sliders, for your internal model of which kinds of people alone can do useful research? Did you actually go out there and measure which of the people on this page have which combinations of factors?

I think the biggest harm that comes from making this kind of claim is that, like small penis discourse (WARNING: CW) there’s a ton of collateral damage done when you say it out loud that I think far outweighs whatever clarity the listeners gain. I mean, what’s the chain of thought gonna be for the other people in the room[1]?

  • “If I don’t do or I haven’t done what you claim a ‘great’ alignment researcher should be doing, then what does that make me?”

  • “If I don’t meet your criteria for being the Chosen One to Solve Alignment Once And For All, where does that leave me?”

True, some wiseass can probably respond at this point, “but wouldn’t a Great Alignment Researcher have enough metacognition to ignore such statements” But could your archetype of alignment-Einstein have predicted that someone who has had persistent problems with fatigue for several decades and has a tested IQ of only 143 would go on to write a fanfic that would serve as a gateway drug for thousands of people into rationality, and by extension, the entire quagmire we’re in? Hell, if you have such a crystal-clear model of what makes for great alignment researchers, why haven’t you become one?

Thankfully I don’t think a lot of the top brass believe the notion that we can tell at a glance who is destined to join their ranks. Because if so, we can replace those two-hour-long, alignment camp Google Forms with the following three-and-a-half-item questionnaire:

  1. Do you have an IMO gold medal?

  2. Have you published a paper in a high-impact journal while still in uni?

    1. If so, does your publishing schedule follow Fig. 6 in the Shockley paper above?

  3. Barring (1) and (2), have you written a conclusive, exhaustive debunking of a particular scientific consensus?

If we could tell at a glance who the destined ones are, if it were just a matter of exposing them to the problem and then giving them the resources to work on it, then SERI MATS and AGI Safety Fundamentals should just close up shop. Provided such a super-Einstein exists in this sea of eight billion poor souls we can just replace all such upskilling programmes with a three-minute spot in the next Superbowl.

But the fact of the matter is, we don’t know what makes for a good alignment researcher. Human interpretability is barely better than that of the machine models we worry about so much and our ways of communicating such interpretations are even more suspect. It is a testament to how utterly inefficient we are as a species when it comes to using our sense-data that the thousands of petabytes we produce each day barely affects the granularity of our communicable scientific models. We can’t even reliably produce Gausses and von Neumanns two thousand years after we’ve learned that humans can achieve such heights of mathematical sophistication.

When we set a social norm that it is okay and expected to exclude people who do not fit certain molds, we risk ruling out the really great ones[2]. We often chide academia for having been coopted by bureaucratic slowness and perverse incentives but at least they have a reproducible way of producing experts! We do not! By choosing to shortcut actually having a gears-level understanding of research performance[3], we are sweeping under the rug the fact that we don’t have good supportive infrastructure that reliably produces good researchers.

And, well, another contender explanation might be that this is all just cope and that we really are just fodder for the giants that will loom over us. Hell, maybe the kind of person who would write a rant like this isn’t the sort of person who would go on to solve the Hard Parts of the Problem, who would win all the Nobels, and siphon all the grant moneys.

Fair enough.

But if this rant saves the future Chosen One from having to doubt themselves because of some reckless schmuck who probably won’t go on to do great things tells them a bunch of nonsense, then it would have been all worth it.


  1. ↩︎

    Sure, some people probably need to hear it because it’s true and it’s better to save them the heartbreak, but is that really your intention when you utter those words? Are you really doing the EV calculation for the other person?

  2. ↩︎

    Another response to this could be that our resources aren’t infinite, and so we should optimise for those who can contribute meaningfully on the margin. But again, are you really doing the EV calculation when you say such things in your mind’s voice?

  3. ↩︎

    A point I wanted to make that I couldn’t fit anywhere else is that, it’s plausible we aren’t yet seeing a lognormal distribution when it comes to alignment research because we haven’t optimised away all the inessential complexities of doing said research. Academia is old. They’ve had several dozen cycles of Goodharting and counter-Goodharting what makes for good science, and well, that’s not even what we care about. Who knows what a fully optimised notkilleveryone-ist program would look like, if it can even be measured in things like h-indices?