Introduction

I’m currently flirting with the idea of trying for a math PhD 2 − 3 years down the line.

I’m currently on a Theoretical Computer Science Masters program at the University of [Redacted] in the United Kingdom.

(My program is 2 semesters of teaching (7 − 8 months) followed by a 9 − 12 month industrial placement starting in June 2023. [I might forego the 1 - year industrial placement if I can’t get a suitable placement [I.E. theoretical research that feels like it would be valuable experience for the kind of career I want to pursue] and graduate after one year by completing a masters project over the summer instead.)

After graduation I’m considering taking a gap year to fill in the gaps in my maths knowledge/prepare for the PhD, maybe see if I can contribute to the research agendas I think are interesting/promising).

I might also pursue a PhD in Theoretical Computer Science instead of mathematics (maybe applications for a TCS PhD would be looked on more favourably with a TCS Masters/recommendations from my current lecturers).

Why A PhD?

I currently plan to learn a lot of (especially abstract) maths to (upper) graduate level for alignment theory (I want to do theoretical alignment work that is basically just applied maths), and I think I would benefit from the opportunity to study the relevant mathematics under a “guru”/the dedicated mentorship a PhD provides. I expect the first few years of my career in alignment research would be mostly spent on deconfusion/distillation, and I expect high levels of mathematical sophistication to be very valuable for that.

I find abstract maths and mathematical modelling “fun”, and really enjoy being a student.

My Alignment Theory of Change

I am operating under/optimising for long timelines (transformative AI is decades away [20+ years]), and this influences what kind of research I believe to be most promising.

I expect theoretical (especially foundational [especially in our current pre paradigmatic stage to be the most promising]) and am persuaded by agent foundations agendas. The extant research agendas I’m most excited for and could see myself working on someday:

John Wentworth’s Natural Abstractions Hypothesis and Selection Theorems
Vanessa Kosoy’s Learning Theoretic Alignment Agenda
- Other agendas that take a desiderata first approach to alignment
Garrabant and Demski’s Embedded Agency

My basic plan for alignment is something like:

Study subjects that seem relevant to alignment theory
- Mathematics: a fuckton
- Theoretical Computer Science: likewise, a fuckton
- Statistics (and its theory)
- Learning Theory (Algorithmic and Statistical)
- Information Theory (Algorithmic and Statistical)
- Physics: Thermodynamics
- Optimisation
- Evolutionary Theory
- Analytic Philosophy: ontology, epistemics, ethics, etc.
  - Develop executable/computable philosophy for the above
- Decision Theory
- Game Theory
Grapple with concepts that bear on agent foundations until I understand them better (“Deconfusion”)
- Information and entropy
- Computation (especially as an information dynamics phenomenon)
- Abstractions, ontology, modelling/map making
- Optimisation
- Causality/dependencies and counterfactuals (including logical)
- Epistemics (including for ideal agents)
- Decision Making (including for ideal agents; especially in multi-agent environments)
- Emergent behaviour in multi-agent environments (e.g. competition, coordination vs conflict, evolution)
- Systems (especially complex) and their emergent behaviour
- Embedded Agency more generally
- Anthropics?
Distill my learnings and understandings to make them more widely accessible (“Distillation”)
Iterate #1 - #3
...
Formulate an adequate theory of robust agency
...
Solve alignment

Of course, I don’t expect to make it all the way to step 8. Mostly, I expect that deconfusion and distillation would be where most of the value from my “career” will come from.

[Question] Should I Pursue a PhD?

Introduction

Why A PhD?

My Alignment Theory of Change