RSS

Geoffrey Irving

Karma: 866

Chief Scientist at the UK AI Safety Institute (AISI). Previously, DeepMind, OpenAI, Google Brain, etc.

Re­search Areas in Cog­ni­tive Science (The Align­ment Pro­ject by UK AISI)

Geoffrey IrvingAug 1, 2025, 10:26 AM
8 points
0 comments6 min readLW link
(alignmentproject.aisi.gov.uk)

The Align­ment Pro­ject by UK AISI

Aug 1, 2025, 9:52 AM
20 points
0 comments2 min readLW link
(alignmentproject.aisi.gov.uk)

The need to rel­a­tivise in de­bate

Jun 26, 2025, 4:23 PM
25 points
2 comments5 min readLW link

Prover-Es­ti­ma­tor De­bate: A New Scal­able Over­sight Protocol

Jun 17, 2025, 1:53 PM
88 points
18 comments5 min readLW link

Un­ex­ploitable search: block­ing mal­i­cious use of free parameters

May 21, 2025, 5:23 PM
34 points
16 comments6 min readLW link

Dodg­ing sys­tem­atic hu­man er­rors in scal­able oversight

Geoffrey IrvingMay 14, 2025, 3:19 PM
33 points
3 comments4 min readLW link

An al­ign­ment safety case sketch based on debate

May 8, 2025, 3:02 PM
57 points
21 comments25 min readLW link
(arxiv.org)