Thanks Thomas for the helpful overview post! Great to hear that you found the AGI ruin opinions survey useful.
I agree with Rohin’s summary of what we’re working on. I would add “understanding / distilling threat models” to the list, e.g. “refining the sharp left turn” and “will capabilities generalize more”.
Some corrections for your overall description of the DM alignment team:
I would count ~20-25 FTE on the alignment + scalable alignment teams (this does not include the AGI strategy & governance team)
I would put DM alignment in the “fairly hard” bucket (p(doom) = 10-50%) for alignment difficulty, and the “mixed” bucket for “conceptual vs applied”
Sorry for the late response, and thanks for your comment, I’ve edited the post to reflect these.
No worries! Thanks a lot for updating the post
Thanks Thomas for the helpful overview post! Great to hear that you found the AGI ruin opinions survey useful.
I agree with Rohin’s summary of what we’re working on. I would add “understanding / distilling threat models” to the list, e.g. “refining the sharp left turn” and “will capabilities generalize more”.
Some corrections for your overall description of the DM alignment team:
I would count ~20-25 FTE on the alignment + scalable alignment teams (this does not include the AGI strategy & governance team)
I would put DM alignment in the “fairly hard” bucket (p(doom) = 10-50%) for alignment difficulty, and the “mixed” bucket for “conceptual vs applied”
Sorry for the late response, and thanks for your comment, I’ve edited the post to reflect these.
No worries! Thanks a lot for updating the post