A few thoughts on my self-study for alignment research

In June, I received a grant from the LTFF for a 6-months period of self-study aimed at mastering the necessary background for AI alignment research. The following is advice I would give to people who are attempting something similar. I have tried to keep it short.

Basic advice

You’ll naturally want to maximize “insights per minute” when choosing what to read. But, don’t expect it to be obvious what the most impactful reading material is! It often takes actual focused thought to figure this out.

One shortcut is to just ask yourself what you are really curious about; based on the idea that your curiosity should track “value of information” to some degree, so it can’t be that wrong to follow your curiosity, but also, working through a textbook takes quite a bit of mental energy, so having natural curiosity to power through your study is very helpful.

If you don’t already have something you’re curious about, you can try the following technique to try to figure out what to read:

  • First, list all the things you could potentially read.

    • This step includes looking at recommendation lists from other people. (See below for two possible lists.)

  • For each thing on the list, write down how you feel about maybe reading that.

    • Be honest with yourself.

    • Try to think of concrete reasons that are shaping your judgment.

  • Then, look back over the list.

    • Hopefully, it should be easier to decide now what to read.

This was helpful to me, which doesn’t necessarily mean it’s helpful for you, but it’s maybe something to try.

Advice specifically for AI alignment

The above should hold for any topic; the following advice is for AI alignment research study specifically.

  1. I think you basically can’t go wrong with reading all of (or maybe, the to-you-most-interesting 80% of) the AI alignment articles on Arbital. I found this to be the most effective way to rapidly acquire a basic understanding of the difficulty.

  2. In terms of fundamental math, I just picked topics that sounded interesting to me from the MIRI research guide and John Wentworth’s study guide.[1]

  3. It’s probably also a good idea to read some concrete results from alignment research, if only to inform you about what kind of math is required. I think Risks from Learned Optimization in Advanced Machine Learning Systems is one good option. I don’t know of a good list of other results.

Concrete reading recommendations

The following are recommendations for concrete books/​topics that I haven’t seen mentioned anywhere else, but that I liked. I won’t repeat anything that’s already in the MIRI research guide or John Wentworth’s study guide.

  1. Homotopy Type Theory (HoTT)

    • The MIRI research guide recommends “Lambda-Calculus and Combinators” for type theory, but that book is mostly focused on lambda calculus (and is a bit difficult to read, in my opinion).

    • HoTT is about dependent type theory and is quite a nice read. Alternatively, you can also ready my own Introduction to Dependent Type Theory.

    • In order to learn about pure lambda calculus, though (like Church numerals and Y combinators), HoTT is not the right book. I don’t really know of a good book for that.

  2. Topology: A Categorical Approach

    • Learn category theory and topology at the same time! I had trouble motivating myself for learning topology, so combining it with category theory seemed promising to me.

  3. Late 2021 MIRI conversations

    • I would recommend reading the AI alignment articles on Arbital first. Or maybe read the two in parallel: any time you want to know more about something that came up in one of the conversations, look it up on Arbital.

  4. Fixed point exercises

    • See the link for an explanation for why this is useful.

  5. Kolmogorov axioms

    • It is said that “Bayesians prefer Cox’s theorem for the formalization of probability”, but I think knowing Kolmogorov’s classical probability axioms is also important.

    • This in no way replaces reading Probability Theory by E.T. Jeynes.

Finally, I’ll remind you of the planning fallacy, but also of the fact that it’s good to make plans even if they won’t survive contact with reality, because plans help you keep track of the big picture.

Good luck!

  1. ^

    Obviously, you can’t read them all within 6 months. I had read a lot of them already before I started my LTFF grant.