Thanks you for this thoughtful response, I didn’t know about most of these projects. I’ve linked this comment in the DeepMind section, as well as done some modifications for both clarity and including a bit more.
I think you can talk about the agendas of specific people on the DeepMind safety teams but there isn’t really one “unified agenda”.
Thanks you for this thoughtful response, I didn’t know about most of these projects. I’ve linked this comment in the DeepMind section, as well as done some modifications for both clarity and including a bit more.
This is useful to know.
Thanks Thomas for the helpful overview post! Great to hear that you found the AGI ruin opinions survey useful.
I agree with Rohin’s summary of what we’re working on. I would add “understanding / distilling threat models” to the list, e.g. “refining the sharp left turn” and “will capabilities generalize more”.
Some corrections for your overall description of the DM alignment team:
I would count ~20-25 FTE on the alignment + scalable alignment teams (this does not include the AGI strategy & governance team)
I would put DM alignment in the “fairly hard” bucket (p(doom) = 10-50%) for alignment difficulty, and the “mixed” bucket for “conceptual vs applied”
Sorry for the late response, and thanks for your comment, I’ve edited the post to reflect these.
No worries! Thanks a lot for updating the post