I think that rather than ML engineering (recreating GPT, learning PyTorch, etc.) it’s more effective for an AI safety researcher to learn one or several general theories of ML, deep learning, or specifically transformers, such as:
I’ve personally learned (well, at least, read the corresponding paper in full, making sure that I understand or “almost” understand every part of it) from the list above: the circuit theory (Olah et al. 2020) and the mathematical framework for transformers (Elhage et al. 2021). However, this is a very “low variance” choice: if AI safety researchers know any of these theories, it’s exactly these two because these papers are referenced in the AGI Safety Fundamentals Alignment curriculum. I think it would be more useful for the community for more people to get acquainted with more different theories of ML or DL so that the community as a whole has a more diversified understanding and perspective. Of course, it would be ideal if some people learned all these theories and were able to synthesise them, but in practice, we can hardly expect that such super-scholars will appear because everyone has so little time and attention.
I think that rather than ML engineering (recreating GPT, learning PyTorch, etc.) it’s more effective for an AI safety researcher to learn one or several general theories of ML, deep learning, or specifically transformers, such as:
Balestriero’s spline theory of deep learning (2018) and the geometry of deep networks (2019)
Olah et al.’s theory of circuits (2020)
Roberts, Yaida, and Hanin’s deep learning theory (2021)
Vanchurin’s theory of machine learning (2021)
Anthropic’s mathematical framework for transformers (2021)
Boyd, Crutchfield, and Gu’s theory of thermodynamic machine learning (2022)
Marciano’s theory of DNNs as a semi-classical limit of topological quantum NNs (2022)
Bahri et al.’s review of statistical mechanics of deep learning (2022)
Alfarra et al.’s tropical geometry perspective on decision boundaries of NNs (2022)
I’ve personally learned (well, at least, read the corresponding paper in full, making sure that I understand or “almost” understand every part of it) from the list above: the circuit theory (Olah et al. 2020) and the mathematical framework for transformers (Elhage et al. 2021). However, this is a very “low variance” choice: if AI safety researchers know any of these theories, it’s exactly these two because these papers are referenced in the AGI Safety Fundamentals Alignment curriculum. I think it would be more useful for the community for more people to get acquainted with more different theories of ML or DL so that the community as a whole has a more diversified understanding and perspective. Of course, it would be ideal if some people learned all these theories and were able to synthesise them, but in practice, we can hardly expect that such super-scholars will appear because everyone has so little time and attention.
The list above is copied from the post “A multi-disciplinary view on AI safety research”. See also the section “Weaving together theories of cognition and cognitive development, ML, deep learning, and interpretability through the abstraction-grounding stack” in this post, which is relevant to this question.