DeepMind Alignment Team on Threat Models and PlansVika25 Nov 2022 14:43 UTCA collection of posts presenting our understanding of and opinions on alignment threat models and plans.DeepMind is hiring for the Scalable Alignment and Alignment TeamsRohin Shah and Geoffrey Irving13 May 2022 12:17 UTC150 points34 comments9 min readLW linkDeepMind alignment team opinions on AGI ruin argumentsVika12 Aug 2022 21:06 UTC395 points37 comments14 min readLW link1 reviewWill Capabilities Generalise More?Ramana Kumar29 Jun 2022 17:12 UTC133 points39 comments4 min readLW linkClarifying AI X-riskzac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot Catt1 Nov 2022 11:03 UTC127 points24 comments4 min readLW link1 reviewThreat Model Literature Reviewzac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot Catt1 Nov 2022 11:03 UTC77 points4 comments25 min readLW linkRefining the Sharp Left Turn threat model, part 1: claims and mechanismsVika, Vikrant Varma, Ramana Kumar and Mary Phuong12 Aug 2022 15:17 UTC86 points4 comments3 min readLW link1 review(vkrakovna.wordpress.com)Refining the Sharp Left Turn threat model, part 2: applying alignment techniquesVika, Vikrant Varma, Ramana Kumar and Rohin Shah25 Nov 2022 14:36 UTC39 points9 comments6 min readLW link(vkrakovna.wordpress.com)Categorizing failures as “outer” or “inner” misalignment is often confusedRohin Shah6 Jan 2023 15:48 UTC93 points21 comments8 min readLW linkDefinitions of “objective” should be Probable and PredictiveRohin Shah6 Jan 2023 15:40 UTC43 points27 comments12 min readLW linkPower-seeking can be probable and predictive for trained agentsVika and janos28 Feb 2023 21:10 UTC56 points22 comments9 min readLW link(arxiv.org)Paradigms of AI alignment: components and enablersVika2 Jun 2022 6:19 UTC53 points4 comments8 min readLW link[Linkpost] Some high-level thoughts on the DeepMind alignment team’s strategyVika and Rohin Shah7 Mar 2023 11:55 UTC128 points13 comments5 min readLW link(drive.google.com)