Learning Math in Time for Alignment
Epistemic status: Strong hunches, weakly held. At least some of this could be found false in experiments.
If you want to do technical AI alignment research, you’ll need some amount of non-trivial math knowledge. It may be more theoretical, or with more ML/biology grounding, but it’ll definitely be math.
How do you learn all this math?
“Self-teaching” is almost a misnomer, compared to just “learning”. I don’t need to distill something for others, I only need myself to grok it. I may use distillation or adjacent techniques to help myself grok it, but like any N=1 self-experiment, it only needs to work for me. [1]
So then… what helps me understand things?
Formal rules that are written precisely
Wordy concepts that one could use in an essay
Math is technically the former, but real mathematicians (even the great ones!) actually use it more like the latter. That is, they use a lot of “intuition” built up over time.
You can’t survive on intuition alone (unless you have the genetic improbability of Ramanujan’s brain). And you can’t survive on rigor alone (according to all bounded human minds doing math research). Heck, even learning rigorously/boring is nontrivial (since e.g. small errors are harder to correct when you’re learning an alien system).
The Mathopedia concept is, in many ways, the “wordy” version. Viliam notes that math’s “hardness” (i.e. objectivity) means you can’t just teach it in the wordy version. After all, there is generally one real canonical definition for a mathematical object.
And yet… both Viliam and Yudkowsky say that math is fun when you know what you’re doing. I kind of agree! I’ve had fun doing (what seemed like) math, at least twice in my life!
OK, so it’s simple! Just make sure to understand everything thoroughly before moving to the next thing, and “play with the ideas” to understand them better.
Except… there’s a problem.
AI timelines.
Giving children quality tutoring and new K-12 curricula won’t work even if we have 20 years before existentially-risky AI is used. 5 years is almost reasonable to learn deeply about a subfield or two, enough to make original contributions.
AI alignment, if it involves enough math to justify this post, requires deeper-than-average understanding, and possibly an ability to create entirely new mathematics.
And timelines might be as short as a year or two. [2]
Tangent (for large grantmakers and orgs only)
Why didn’t MIRI or other groups prepare for this moment earlier? Why didn’t MIRI say “OK, we have $X to fund researchers, and $Y left over, so let’s put $Z towards hedging our short-timelines bets. We can do that using human enhancement and/or in-depth teaching of the relevant hard (math) parts. Let’s do that now!”?
I think it’s something like… MIRI had pre-ML-calibrated short timelines. Now they have post-ML short timelines. In both cases, they wouldn’t think “sharpening the saw”-type strategies worthwhile. And if short timelines are true now, then it’s too late to use them.
Luckily, insofar as AI governance does anything, we can get longer timelines. And insofar as you (a large grantmaker or org with funds/resources to spare on hedging your timeline scenarios) have enough money to hedge your timeline bets, you should fund and/or set up such longer-term programs. If you put 80% credence in 5-year timelines, but you also control $100 million in funding (e.g. you’re OpenPhil), then you should be doing math-learning and intelligence enhancement programs!
The Challenge
So clearly, a person needs to be able to get deep understanding of lots of math (in backchaining-resistant worlds, that means lots of math). Within a year or two. In time to, and with the depth needed to, come up with new good ideas.
This is the challenge.
This post is the first in (hopefully) a series of posts, as I learn and learn-how-to-learn some alignment-relevant math from a promising reading list.
If you want to join the challenge, I must note two things.
First, the field of pedagogy is filled with bad ideas, and even the best existing ones are rarely up to the challenge posed here [3]. I will probably write a post (or a few) speedrunning these.
Second, like human intelligence enhancement, any sufficiently effective math-learning technique can count as a dangerous capability. So if you find something cool, be careful with it, and only share with those looking to use human enhancement for alignment research.
As I explore the challenge of learning math deep enough and quick enough, I’ll hopefully make progress on it. It’s on my mind a lot.
If you are not, personally, a world-class technical alignment researcher… it should be on your mind, too.
- ^
Also, any tools (or notes) I generate in the course of learning can, later, be filtered for exfo and then shared with others on alignment-specific or math-specific websites).
- ^
Or shorter, but at that point you may be bringing in more assumptions, above the existing assumptions needed for 1- or 2-year timelines.
- ^
Rule of thumb: If you want it to be true, treat it with deep skepticism.
lots of this is written like assertions. how much of it do you know vs suspect? what parts are hunches to be tested?
Good catch! Most of it is hunches to be tested (and/or theorized on, but really tested) currently. Fixed
I self-studied a bunch of math in 2017-2019 in order to do AI alignment research (specifically, agent foundations type stuff), and have a lot of thoughts about how to do it. Feel free to message me if you want to discuss.