How To Get Into Independent Research On Alignment/Agency
I’m an independent researcher working on AI alignment and the theory of agency. I’m 29 years old, will make about $90k this year, and set my own research agenda. I deal with basically zero academic bullshit—my grant applications each take about one day’s attention to write (and decisions typically come back in ~1 month), and I publish the bulk of my work right here on LessWrong/AF. Best of all, I work on some really cool technical problems which I expect are central to the future of humanity.
If your reaction to that is “Where can I sign up?”, then this post is for you.
Background Models
Independence
First things first: the “independent” part of “independent research” means self-employment, and everything that goes with it. It means the onus is on you to figure out what to do, how to provide value, what to prioritize, and what to aim for. In practice, it also usually means “independent” in a broader sense: you won’t have a standard template or agenda to follow. If you go down this path, assume that you will need to chart your own course—in particular, your own research agenda.
For the sort of person this post is aimed at, that will be a very big upside, not a downside.
Disclaimer: there are ways to get into alignment research which don’t involve quite so much figuring-it-all-out-on-your-own. Some people receive mentorship from existing researchers. Some people go work for alignment research organizations. Either of those paths can involve “independent research” in the sense that you are technically self-employed, but those paths aren’t “independent” in the broader sense of the word, and they’re not the main topic of this post.
Preparadigmicity
As a field, the study of alignment and agency is especially well-suited to independent research, because they center around problems we don’t understand. It’s not just that we don’t have the answers; we don’t even have the right frames for thinking about the problems. Agency is an area where we are fundamentally confused. AI alignment is largely a problem which hasn’t happened yet, on technology which hasn’t been invented yet, which we nonetheless want to solve in advance. Figuring out the right frames—the right paradigm—is itself a central part of the job.
The field needs people who are going to come up with new frames/approaches/models/paradigms/etc, because we’re pretty sure the current frames/approaches/models/paradigms/etc aren’t enough. Thus the great fit for independent research: as an independent researcher, you’re not beholden to some existing agenda based on existing frames. Coming up with your own idea of what the key problems are, how to frame them, what tools to apply… that sort of thing is exactly what we need, and it requires people who aren’t committed to the strategies of existing senior researchers and organizations. It requires people who have an independent high-level understanding of the field and different angles of looking at, and can pick out the key problems and paths from that perspective.
Again, for the sort of person this post is aimed at, that will be a very big upside.
… but it comes with some trade-offs. As a historical example of preparadigmatic research, here’s Kuhn talking about optics before Newton:
Being able to take no common body of belief for granted, each writer on physical optics felt forced to build his field anew from its foundations. In doing so, his choice of supporting experiment and observation was relatively free, for there was no standard set of methods or of phenomena that every optical writer felt forced to employ and explain. Under these circumstances, the dialogue of the resulting books was often directed as much to the members of other schools as it was to nature.
This very much applies to alignment research. Because the field does not already have a set of shared frames—i.e. a paradigm—you will need to spend a lot of effort explaining your frames, tools, agenda, and strategy. For the field, such discussion is a necessary step to spreading ideas and eventually creating a paradigm. For you, it’s a necessary step to get paid, and to get useful engagement with your work from others.
In particular, you will probably need to both think and write a lot about your strategy: the models and intuitions which inform why you’re working on the particular problems you’ve chosen, why the tools you’re using seem promising, what kinds of results you expect, and what your long-term vision looks like. Inevitably, a lot of this will rely on informal arguments or intuitions; you will need to figure out how to trace the sources of those intuitions and explain them to other people, without having to formalize everything. Explain the actual process which led to an idea/decision/approach, without going down the bottomless rabbit hole of deeply researching every single claim.
The current version of LessWrong was built in large part to support exactly that sort of discussion, and I strongly recommend using it.
Getting Paid
Right now, the best grantmaker in this space is the Long-Term Future Fund (LTFF). There are other options, but none are quite as good a fit for the sort of work we’re talking about here.
I’ve received a few LTFF grants myself and know some of the people involved in the grantmaking decisions, so I’ll give some thoughts on the most important things you’ll need in order to get paid. Bear in mind that this is inherently speculative and not endorsed by anyone at LTFF. I’d also recommend looking at LTFF’s past grants to get a more direct idea of what kinds of things they fund.
Don’t Bullshit
A low-bullshit grantmaking process works both ways. The LTFF wants to do object-level useful things, not just Look Prestigious, so they keep the application simple and the turnaround time relatively fast. The flip side is that I expect them to look very unkindly on bullshit—i.e. attempts to make the applicant/application Sound Prestigious without actually doing object-level useful things.
In academia, it’s common practice to make up some bullshit about how your research is going to help the world. During my undergrad, this sort of bullshit was explicitly taught. Of course, it’s not like anyone is ever going to hire an economist or statistician (let alone consult a prediction market) to figure out whether the research is actually likely to impact the world in the manner claimed. The goal is just to make the proposal sound good. If you’re coming from academia, this sort of bullshit may be an ingrained habit which takes effort to break.
If you want to make it in alignment/agency research, you’re going to need an actual object-level strategy.
We’ll talk more in the next sections about how to come up with a strategy, but the first stop is The Bottom Line: once you’ve chosen a strategy, anything you say to justify it will not make it any more correct. All that matters is the process which originally made you choose that strategy, or made you stick to it at times when you might realistically have changed course. So first things first, forget whatever clever idea you already have cached, and let’s start from a blank slate.
Reading
Preparadigmicity means you’ll need to spend a lot of time explaining your choice of vision, strategy, models, tools, etc. The flip side of that coin is reading: you’ll probably need to read quite a bit of material from others in the field. This is often nontechnical or semi-technical background material, explanations of intuitions, vague gesturing at broad ideas, etc—you can see plenty of it here on LessWrong and the Alignment Forum. The more of this you read, the better you’ll understand other researchers’ frames (or at least know which frames you don’t understand), and the better you’ll be able to explain your own material in terms others can readily understand.
Early on, there are two main motivators for reading:
To understand which strategies have already been tried, and failed, to avoid retreading that ground
To understand a bit of the existing jargon (definitely not all of it!), in order to explain your own ideas in terms already familiar to others
To understand (some) existing approaches and jargon, I’d recommend at least skimming these sequences/posts, and diving deeper into whichever most resemble the directions you want to pursue:
To understand barriers (other than what’s discussed in the above links), this talk and the Rocket Alignment Problem are probably the best starting points. Note that lots of people disagree with those last two links (as well as 11 Proposals), but you probably want to be at least familiar enough to have an informed disagreement.
Note that this is all on LessWrong, which means you can leave comments with questions, attempts to summarize, disagreements, etc. Often people will reply. This helps a lot for actually absorbing the ideas. (h/t Adam Shimi for pointing this out.)
I invite others to leave suggested reading in the comments. (This does risk turning into a big debate over whether X or Y is actually a good idea for new people, but at least then we’ll have a realistic demonstration of how much everybody disagrees over all this. I did warn you that the field is preparadigmatic!)
Finally, there’s The Sequences. They are long, but if you haven’t read them, then you definitely risk various failure modes which will be obvious to people who have read them and very confusing to you. I wouldn’t quite say they’re required reading, especially if you’re on the more technical end of the spectrum and already somewhat familiar with alignment discussions, but there are definitely many people who will be somewhat surprised if you do technical alignment/agency research and haven’t read them.
Again, I want to emphasize that everyone disagrees on all this stuff. Roughly speaking, assume that the grantmakers care more about your research having some plausible path to usefulness than about agreeing with any particular position in any of the field’s ongoing arguments.
The Hamming Question
Over on the other side of the dining hall was a chemistry table. I had worked with one of the fellows, Dave McCall; furthermore he was courting our secretary at the time. I went over and said, “Do you mind if I join you?” They can’t say no, so I started eating with them for a while. And I started asking, “What are the important problems of your field?” And after a week or so, “What important problems are you working on?” And after some more time I came in one day and said, “If what you are doing is not important, and if you don’t think it is going to lead to something important, why are you at Bell Labs working on it?” I wasn’t welcomed after that; I had to find somebody else to eat with!
Probably the most common mistake people make when first attempting to enter the alignment/agency research field is to not have any model at all of the main bottlenecks to alignment, or how their work will address those bottlenecks. The standard (and strongly recommended) exercise to alleviate that problem is to start from the Hamming Questions:
What are the most important problems in your field (i.e. alignment/agency)?
How are you going to solve them?
At this point, somebody usually complains that minor contributions are important or some such. I’m not going to argue with that, because I expect the sort of person who this post is already aimed at (i.e. people who are excited to forge their own path in a technical field where everyone is fundamentally confused) is probably not the sort of person who is aiming for minor contributions anyway.
If you have decent answers to the Hamming Questions, and you make those answers clear to other people, that is probably a sufficient condition for your grant application to not end up in the giant pile of applications from people who don’t even have a model of how their proposal will help. It’s not quite a sufficient condition to get paid, but I would guess that a large majority of people who can clearly answer the Hamming Questions do get paid.
I want to emphasize that I think clear answers to the Hamming Questions are an approximately-sufficient condition, not an approximately-necessary condition; there are definitely other paths. Steve’s story in the comments below is a good example; in his words:
If you’re a kinda imposter-syndrome-y person who just constitutionally wouldn’t dream of looking themselves in the mirror and saying “I am aiming for a major contribution!”, well me too, and don’t let John scare you off. :-P
Use Your Pareto Frontier
A great line from Adam Shimi:
Most people who try to go in a direction ‘no one else has tried’ end up going in the most obvious direction which everyone else has tried.
My main advice to avoid this failure mode is to leverage your Pareto frontier. Apply whatever knowledge, or combination of knowledge, you have which others in the field don’t. Personally, I’ve gained a lot of insight into agency by drawing on systems biology, economics, statistical mechanics, and chaos theory. Others draw heavily on abstract math, like category theory or model theory. Evolutionary biology and user interface design are both rich sources.
This is one reason why it helps to have a broad technical background: the more frames and tools you have to draw on, the more likely you’ll find a novel and promising combination to apply to the most important problems in the field. (Or, just as good: the more frames and tools you have to draw on, the more likely you’ll notice that one of the most important problems has been overlooked.)
Flip side of this: if you have a novel-seeming idea which involves the same kinds of frames and tools which most people in alignment have (i.e. programming expertise, some ML experience, reading Astral Codex Ten) then do write it up, but don’t be surprised if it’s already been done.
If you read through some existing alignment work, and the strategy seems obviously wrong to you in a way which would not be obvious to the median LessWrong user, then that’s a very promising sign.
Legibility
Part of getting a grant is not just having a good plan and the skills to execute it, but to make your plan and skills legible to the people reviewing the grant.
Here’s (my summary of) a rough model from Oli, who’s one of the fund managers for LTFF. In order to get a grant for alignment research, usually someone needs to do one of these three:
Write a grant application which clearly signals that they understand the alignment problem and have a non-bullshitted research strategy. (This is rare/difficult.)
Have a reference from someone the fund managers know and trust (i.e. the existing alignment research community).
Have some visible online material which clearly signals that they understand the alignment problem and have a non-bullshitted research strategy. (LessWrong posts/comments are a central example.)
As a new entrant to the field, I expect that option #3 is probably your main path. Write up not just your research strategy, but the intuitions, models and arguments behind that strategy. Give examples. Explain what you consider the key problems, why those problems seem central, and the frames and generators behind that reasoning. Again, give examples. Explain conjectures or tools you think are relevant, ideally with examples. If you’re on the theory side, sketch potential empirical tests; if on the empirical side, sketch the conceptual theory behind the ideas. And include examples. Explain your vision of success, and expected applications of your research (if it succeeds). At all stages, focus on giving accessible, intuitive explanations and lots of examples; even people who have lots of technical background will often skip over sections with just dense math, and not everyone has the same technical background as you. And put the examples at the beginnings of the posts, before the abstract/general explanations.
Remember: this is preparadigmatic work. Writing up the ideas, and the generators of the ideas, and the frames, and the tools, and making it all clear and accessible to people with totally different frames and tools, is a central part of the job.
All this writing will also make option #1 and #2 easier over time: writing a lot of posts and comments will eventually generate social connections (though this takes quite a bit of time, especially if you’re not in the Bay Area), and discussion/feedback will give some idea of how to explain things in a way which signals the kinds-of-things LTFF looks for.
(On the topic of feedback: a lot of more experienced researchers ignore most posts which they don’t find very promising, partly because it’s a lot of work to explain/argue about problems and partly because there are too many posts to read it all anyway. If you explicitly reach out—e.g. send a message on LessWrong—and ask for feedback, people are much more likely to tell you what they think.)
By the time all that is written up and posted, the grant application itself is a drop in the bucket; that’s a big part of why it only takes a day to write up. A quote from Oli regarding the actual application:
I really wish people would just pretend they’re writing me an email explaining what they plan to do, rather than something aimed at the general public.
This is part of why option #1 is rare—people try to write the LTFF application like it’s an academic grant application or something, and it really isn’t. But also, clear communication is just pretty hard in general, even when you do understand the problem and have a non-bullshitted strategy.
When To Start
This post was mostly written for people who already have the technical skills they need. That probably means grad-level education, though a PhD is definitely not a formal requirement. I know at least a few who think less-than-a-full-undergrad can suffice. Personally, I never went to grad school (though admittedly my undergrad coursework looks an awful lot like a PhD program; I got an unusually large amount of mileage out of it).
In terms of specific skills, I recently wrote a study guide with a bunch of technical topics I’ve found useful, but the more important point is that we don’t currently know what the right combination of background knowledge is. If you already have a broad technical background, then my advice is to take a stab at the problem and see how it goes.
If you are currently in high school or undergrad, the study guide has some recommendations for what to study (and why). The larger your knowledge base, the more tools and frames you’ll have to draw on later. You could also apply for a grant to e.g. pursue some alignment/agency research project over the summer; taking a stab at it will give you some firsthand data on what kinds of tools/frames are useful.
Runway
The grant application takes maybe a day, but there will probably be some groundwork before you’re ready for that. You’ll probably want to read a bunch, figure out a strategy, put up a few posts on it, and maybe update in response to feedback.
Personally, I quit my job as a data scientist in late 2018, and tried out a few different things over the course of the next year before settling into alignment/agency research. I got my first grant in late 2019. If someone with roughly my 2018 level of background knew up front that they wanted to enter the field, I think it would take a lot less time than that; a few months would be my guess. That said, my level of background in 2018 was already well above zero.
I wrote a fair bit on LessWrong, and researched some agency problems, even before quitting my job. I do expect it helps to “ease into it” this way, and if you’re coming in fresh you should probably give yourself extra time to start writing up ideas, following the field, and getting feedback. That said, you should probably plan on going full time at latest by the time you get a grant, and possibly sooner. If you’re in academia, then you’ll probably have more room to aim the bulk of your research at alignment without striking out on your own. (Though you should still totally strike out on your own and enjoy the no-academic-bullshit lifestyle.)
Meta
Historically, EA causes (including alignment) have largely drawn from very young populations (mostly undergrads). I believe this is mostly because (a) those are the people who don’t need to be drawn away from a different path which they’re already on, (b) they’re willing to work for peanuts, and (c) they don’t have to unlearn how to bullshit. Unfortunately, a lot of alignment research benefits from a broad technical background, which takes time to build up. So I think we’ve historically had fewer researchers with that sort of broad knowledge than would be ideal, just because we tend to recruit young people.
But conditions have changed in recent years, and I think there’s now room for a different kind of recruitment, aimed at (somewhat) older people with more knowledge and experience.
First: the Sequences are about ten years old, so right about now there are probably a bunch of postgrads and adjunct professors with lots of technical skills who have already read them, have decent epistemic habits (i.e. know how to not bullshit), and have a rough understanding of what the alignment problem is.
Second: nowadays, we have money. If you’re a postgrad or adjunct professor or whatever, and you can do good technical alignment research, you can probably make more money as an independent researcher in alignment than you do now. Our main grantmaker has an application form which takes maybe a few hours at most, usually comes back with a decision in under a month, and complains that it doesn’t have enough good projects to spend its money on.
So if you’re the sort of person who:
Wants to tackle big open research problems
… in a field where everyone is confused and we don’t have a paradigm yet and you have to basically chart your own course
… and the stakes are literally astronomical
… and you have a bunch of technical skills, maybe read the sequences ten years ago, and have a basic understanding of what AI alignment is and why it’s hard
… then now is a good time to sit down with a notebook and think about how you’d go about understanding alignment/agency. If you have any promising ideas, write them up, post them here on LessWrong, and apply for a grant to pursue this research full-time.
I can attest that it’s an awesome job.
- How to pursue a career in technical AI alignment by 4 Jun 2022 21:36 UTC; 265 points) (EA Forum;
- Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023) by 10 May 2023 19:04 UTC; 254 points) (
- 2021 AI Alignment Literature Review and Charity Comparison by 23 Dec 2021 14:06 UTC; 176 points) (EA Forum;
- 2021 AI Alignment Literature Review and Charity Comparison by 23 Dec 2021 14:06 UTC; 168 points) (
- Reshaping the AI Industry by 29 May 2022 22:54 UTC; 147 points) (
- Refine: An Incubator for Conceptual Alignment Research Bets by 15 Apr 2022 8:57 UTC; 144 points) (
- An Overview of the AI Safety Funding Situation by 12 Jul 2023 14:54 UTC; 129 points) (EA Forum;
- EA Communication Project Ideas by 19 Nov 2021 19:56 UTC; 128 points) (EA Forum;
- AI safety starter pack by 28 Mar 2022 16:05 UTC; 126 points) (EA Forum;
- How to become an AI safety researcher by 12 Apr 2022 11:33 UTC; 112 points) (EA Forum;
- Apply for MATS Winter 2023-24! by 21 Oct 2023 2:27 UTC; 104 points) (
- Naive Hypotheses on AI Alignment by 2 Jul 2022 19:03 UTC; 98 points) (
- A Quick List of Some Problems in AI Alignment As A Field by 21 Jun 2022 23:23 UTC; 75 points) (
- SERI MATS—Summer 2023 Cohort by 8 Apr 2023 15:32 UTC; 71 points) (
- How to pursue a career in technical AI alignment by 4 Jun 2022 21:11 UTC; 69 points) (
- An Overview of the AI Safety Funding Situation by 12 Jul 2023 14:54 UTC; 67 points) (
- Voting Results for the 2021 Review by 1 Feb 2023 8:02 UTC; 66 points) (
- Who Aligns the Alignment Researchers? by 5 Mar 2023 23:22 UTC; 48 points) (
- Refine: An Incubator for Conceptual Alignment Research Bets by 15 Apr 2022 8:59 UTC; 47 points) (EA Forum;
- Has private AGI research made independent safety research ineffective already? What should we do about this? by 23 Jan 2023 7:36 UTC; 43 points) (
- Why I’m Not (Yet) A Full-Time Technical Alignment Researcher by 25 May 2023 1:26 UTC; 39 points) (
- SERI MATS—Summer 2023 Cohort by 8 Apr 2023 15:32 UTC; 36 points) (EA Forum;
- How to Read Papers Efficiently: Fast-then-Slow Three pass method by 25 Feb 2023 2:56 UTC; 36 points) (
- Apply for MATS Winter 2023-24! by 21 Oct 2023 2:34 UTC; 34 points) (EA Forum;
- Apply to MATS 7.0! by 21 Sep 2024 0:23 UTC; 31 points) (
- 20 Jan 2023 8:27 UTC; 31 points) 's comment on AGI safety field building projects I’d like to see by (
- You don’t need to be a genius to be in AI safety research by 10 May 2023 22:23 UTC; 28 points) (EA Forum;
- 21 Jan 2022 4:30 UTC; 27 points) 's comment on Action: Help expand funding for AI Safety by coordinating on NSF response by (EA Forum;
- Apply to MATS 7.0! by 21 Sep 2024 0:23 UTC; 27 points) (EA Forum;
- General advice for transitioning into Theoretical AI Safety by 15 Sep 2022 5:23 UTC; 25 points) (EA Forum;
- Who Aligns the Alignment Researchers? by 5 Mar 2023 23:22 UTC; 23 points) (EA Forum;
- How to become an AI safety researcher by 15 Apr 2022 11:41 UTC; 23 points) (
- 14 Apr 2022 23:58 UTC; 18 points) 's comment on My least favorite thing by (
- A Quick List of Some Problems in AI Alignment As A Field by 21 Jun 2022 17:09 UTC; 16 points) (EA Forum;
- If I want to test how good I would be as an AI safety researcher alongside my full-time job (with the hope of it becoming my full-time career at some point), is this a good plan? by 2 Mar 2023 9:44 UTC; 16 points) (
- 1 Jun 2022 7:59 UTC; 16 points) 's comment on Six Dimensions of Operational Adequacy in AGI Projects by (
- Has private AGI research made independent safety research ineffective already? What should we do about this? by 23 Jan 2023 16:23 UTC; 15 points) (EA Forum;
- Novelty Generation—The Art of Good Ideas by 20 Aug 2022 0:36 UTC; 15 points) (
- You don’t need to be a genius to be in AI safety research by 6 May 2023 2:32 UTC; 14 points) (
- General advice for transitioning into Theoretical AI Safety by 15 Sep 2022 5:23 UTC; 12 points) (
- Why I’m Not (Yet) A Full-Time Technical Alignment Researcher by 25 May 2023 1:26 UTC; 11 points) (EA Forum;
- [Linkpost] How To Get Into Independent Research On Alignment/Agency by 14 Feb 2022 21:40 UTC; 10 points) (EA Forum;
- 6 Mar 2023 1:33 UTC; 7 points) 's comment on Why Not Just… Build Weak AI Tools For AI Alignment Research? by (
- List of links for getting into AI safety by 4 Jan 2023 19:45 UTC; 6 points) (
- 19 Apr 2022 4:28 UTC; 6 points) 's comment on A Quick Guide to Confronting Doom by (
- 9 Jan 2023 13:17 UTC; 5 points) 's comment on Open & Welcome Thread—January 2023 by (
- 5 Mar 2023 18:51 UTC; 5 points) 's comment on Why Not Just… Build Weak AI Tools For AI Alignment Research? by (
- 10 Nov 2022 15:38 UTC; 5 points) 's comment on Some advice on independent research by (
- 22 Oct 2022 16:52 UTC; 4 points) 's comment on junk heap homotopy’s Shortform by (
- 22 Jun 2022 1:00 UTC; 1 point) 's comment on A Quick List of Some Problems in AI Alignment As A Field by (EA Forum;
Reviewing this quickly because it doesn’t have a review.
I’ve linked this post to several people in the last year. I think it’s valuable for people (especially junior researchers or researchers outside of major AIS hubs) to be able to have a “practical sense” of what doing independent alignment research can be like, how the LTFF grant application process works, and some of the tradeoffs of doing this kind of work.
This seems especially important for independent conceptual work, since this is the path that is least well-paved (relative to empirical work, which is generally more straightforward to learn, or working at an organization, where one has colleagues and managers to work with).
I also appreciate John’s emphasis of focusing on core problems & his advice to new researchers:
I expect I’ll continue to send this to people interested in independent alignment work & it’ll continue to help people go from “what the heck does it mean to get a grant to do conceptual AIS work?” to “oh, gotcha… I can kinda see what that might look like, at least in this one case… but seeing even just one case of this makes the idea feel much more real.”
I’d ideally like to see a review from someone who actually got started on Independent Alignment Research via this document, and/or grantmakers or senior researchers who have seen up-and-coming researchers who were influenced by this document.
But, from everything I understand about the field, this seems about right to me, and seems like a valuable resource for people figuring out how to help with Alignment. I like that it both explains the problems the field faces, and it lays out some of the realpolitik of getting grants.
Actually, rereading this, it strikes me as a pretty good “intro to the John Wentworth worldview”, weaving a bunch of disparate posts together into a clear frame.