RSS

Geoffrey Irving

Karma: 681

Chief Scientist at the UK AI Safety Institute (AISI). Previously, DeepMind, OpenAI, Google Brain, etc.

Dodg­ing sys­tem­atic hu­man er­rors in scal­able oversight

Geoffrey IrvingMay 14, 2025, 3:19 PM
32 points
3 comments4 min readLW link

An al­ign­ment safety case sketch based on debate

May 8, 2025, 3:02 PM
55 points
16 comments25 min readLW link
(arxiv.org)

UK AISI’s Align­ment Team: Re­search Agenda

May 7, 2025, 4:33 PM
107 points
2 comments11 min readLW link

How to eval­u­ate con­trol mea­sures for LLM agents? A tra­jec­tory from to­day to superintelligence

Apr 14, 2025, 4:45 PM
29 points
1 comment2 min readLW link

Prospects for Align­ment Au­toma­tion: In­ter­pretabil­ity Case Study

Mar 21, 2025, 2:05 PM
32 points
5 comments8 min readLW link

A sketch of an AI con­trol safety case

Jan 30, 2025, 5:28 PM
57 points
0 comments5 min readLW link

Elic­it­ing bad contexts

Jan 24, 2025, 10:39 AM
31 points
8 comments3 min readLW link

Au­toma­tion collapse

Oct 21, 2024, 2:50 PM
72 points
9 comments7 min readLW link

De­bate, Or­a­cles, and Obfus­cated Arguments

Jun 20, 2024, 11:14 PM
44 points
4 comments21 min readLW link

Does Cir­cuit Anal­y­sis In­ter­pretabil­ity Scale? Ev­i­dence from Mul­ti­ple Choice Ca­pa­bil­ities in Chinchilla

Jul 20, 2023, 10:50 AM
44 points
3 comments2 min readLW link
(arxiv.org)

Deep­Mind is hiring for the Scal­able Align­ment and Align­ment Teams

May 13, 2022, 12:17 PM
150 points
34 comments9 min readLW link

Learn­ing the smooth prior

Apr 29, 2022, 9:10 PM
35 points
0 comments12 min readLW link