DeepMind Alignment Team on Threat Models and PlansVikaNov 25, 2022, 2:43 PMA collection of posts presenting our understanding of and opinions on alignment threat models and plans.DeepMind is hiring for the Scalable Alignment and Alignment TeamsRohin Shah and Geoffrey IrvingMay 13, 2022, 12:17 PM150 points34 comments9 min readLW linkDeepMind alignment team opinions on AGI ruin argumentsVikaAug 12, 2022, 9:06 PM395 points37 comments14 min readLW link1 reviewWill Capabilities Generalise More?Ramana KumarJun 29, 2022, 5:12 PM133 points39 comments4 min readLW linkClarifying AI X-riskzac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot CattNov 1, 2022, 11:03 AM127 points24 comments4 min readLW link1 reviewThreat Model Literature Reviewzac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot CattNov 1, 2022, 11:03 AM78 points4 comments25 min readLW linkRefining the Sharp Left Turn threat model, part 1: claims and mechanismsVika, Vikrant Varma, Ramana Kumar and Mary PhuongAug 12, 2022, 3:17 PM86 points4 comments3 min readLW link1 review(vkrakovna.wordpress.com)Refining the Sharp Left Turn threat model, part 2: applying alignment techniquesVika, Vikrant Varma, Ramana Kumar and Rohin ShahNov 25, 2022, 2:36 PM39 points9 comments6 min readLW link(vkrakovna.wordpress.com)Categorizing failures as “outer” or “inner” misalignment is often confusedRohin ShahJan 6, 2023, 3:48 PM93 points21 comments8 min readLW linkDefinitions of “objective” should be Probable and PredictiveRohin ShahJan 6, 2023, 3:40 PM43 points27 comments12 min readLW linkPower-seeking can be probable and predictive for trained agentsVika and janosFeb 28, 2023, 9:10 PM56 points22 comments9 min readLW link(arxiv.org)Paradigms of AI alignment: components and enablersVikaJun 2, 2022, 6:19 AM53 points4 comments8 min readLW link[Linkpost] Some high-level thoughts on the DeepMind alignment team’s strategyVika and Rohin ShahMar 7, 2023, 11:55 AM128 points13 comments5 min readLW link(drive.google.com)