Neel Nanda comments on AMA: Paul Christiano, alignment researcher

Neel Nanda Apr 28, 2021, 9:49 PM
LW: 31 AF: 11
AF
Do you have any advice for junior alignment researchers? In particular, what do you think are the skills and traits that make someone an excellent alignment researcher? And what do you think someone can do early in a research career to be more likely to become an excellent alignment researcher?
- paulfchristiano Apr 30, 2021, 7:14 PM
  LW: 18 AF: 10
  AF Parent
  Some things that seem good:
  - Acquire background in relevant adjacent areas—especially a reasonably deep understanding of ML, but then also a broader+shallower background in more distant areas like algorithms, economics, learning theory, and some familiarity with what kinds of intellectual practices work well in other fields.
  - Build some basic research skills, especially (i) applied work in ML (e.g. be able to implement ML algorithms and run experiments, hopefully getting some kind of mentorship or guidance but you can also do a lot independently), (ii) academic research in any vaguely relevant area. I think it’s good to have e.g. actually proven a few things, designed algorithms for a few problems, beaten your head against a few problems and then figured out how to make them work.
  - Think a bunch about alignment. It feels like there is really just not much relevant stuff that’s publicly written so you might as well read basically all of it and try to come up with views on the core questions yourself.
  I personally feel like I got a lot of benefit out of doing some research in adjacent areas, but I’d guess that mostly it’s better to focus on what you actually want to achieve and just be a bit opportunistic about trying to learn other stuff when it’s relevant (rather than going out of your way to do work in an adjacent area).
  I do feel like spending 3-6 months on my own learning ML was useful for getting started in the field despite being a bit of a digression. I’d probably recommend that kind of thing but wouldn’t go further afield.
  Over the course of my life I’ve gotten a really surprising amount of value out of final projects for grad classes (mostly TCS classes in undergrad, and then some when branching out into ML in grad school). It’s a great chance to get guidance about what problems are important, to get some social support for stretching and working on an open problem, and to get some mentorship from faculty. Feels less applicable to alignment since there aren’t many classes on it, but maybe it’s starting to become relevant for alignment at schools with sympathetic faculty, and it’s relevant for ML.
  I think that actually making a unit of progress on your own, whether applied work (e.g. replicating ML papers and making some small additional contributions, or designing and running an interesting experiment) or theoretical work (e.g. trying to advance some discussion at least one step, proposing at least one novel idea that makes progress on a core problem) is a good way to start and to get access to more mentorship. This is what I did in undergrad in the context of normal theoretical CS (trying to prove little tiny theorems that were slightly advances) and it seemed like the right approach. It’s also what I did to some extent for alignment (you can see some of my earliest writing here, with my first major attempted contribution being this post, and probably after that point I would have expected to get some kind of mentorship though I don’t think there was that much to go around at that time; I guess I also wrote this a year earlier, which was mostly a side-observation from thinking about theoretical CS that I thought might be amusing to the LW crowd, but I’m not proud of it and am not sure if it played much of a role in getting in touch with people)