My best guess after ~1 year upskilling is to learn interpretability (which is quickly becoming paradigmatic) and read ML papers that are as close to the area you plan to work in as possible; if it’s conceptual work also learn how to do research first, e.g. do half a PhD or work with someone experienced, although I am pessimistic about most independent conceptual work. Learn all the math that seems like obvious prerequisites (for me this was linear algebra, basic probability, and algorithms, which I mostly already had). Then learn everything else you feel like you’re missing lazily, as it comes up. Get to the point where you have a research loop of posing and solving small problems, so that you have some kind of feedback loop.
The other possible approach, something like going through John Wentworth’s study guide, seems like too much background before contact with the problem. One reason is that conceptual alignment research contains difficult steps even if you have the mathematical background. Suppose you want to design a way to reliably get a desirable cognitive property into your agent. How do you decide what cognitive properties you intuitively want the agent to have, before formalizing them? This is philosophy. How do you formalize the property of conservatism or decide whether to throw it out? More philosophy. Learning Jaynes and dynamical systems and economics from textbooks might give you ideas, but you don’t get either general research experience or contact with the problem, and your theorems are likely to be useless. In a paradigmatic field, you learn exactly the fields of math that have been proven useful. In a preparadigmatic field, it might be necessary to learn all fields that might be useful, but it seems a bit perverse to do this before looking at subproblems other people have isolated, making some progress, and greatly narrowing down the areas you might need to study.
My best guess after ~1 year upskilling is to learn interpretability (which is quickly becoming paradigmatic) and read ML papers that are as close to the area you plan to work in as possible; if it’s conceptual work also learn how to do research first, e.g. do half a PhD or work with someone experienced, although I am pessimistic about most independent conceptual work. Learn all the math that seems like obvious prerequisites (for me this was linear algebra, basic probability, and algorithms, which I mostly already had). Then learn everything else you feel like you’re missing lazily, as it comes up. Get to the point where you have a research loop of posing and solving small problems, so that you have some kind of feedback loop.
The other possible approach, something like going through John Wentworth’s study guide, seems like too much background before contact with the problem. One reason is that conceptual alignment research contains difficult steps even if you have the mathematical background. Suppose you want to design a way to reliably get a desirable cognitive property into your agent. How do you decide what cognitive properties you intuitively want the agent to have, before formalizing them? This is philosophy. How do you formalize the property of conservatism or decide whether to throw it out? More philosophy. Learning Jaynes and dynamical systems and economics from textbooks might give you ideas, but you don’t get either general research experience or contact with the problem, and your theorems are likely to be useless. In a paradigmatic field, you learn exactly the fields of math that have been proven useful. In a preparadigmatic field, it might be necessary to learn all fields that might be useful, but it seems a bit perverse to do this before looking at subproblems other people have isolated, making some progress, and greatly narrowing down the areas you might need to study.