RSS

AdamGleave

Karma: 877

GPT-4o Guardrails Gone: Data Poi­son­ing & Jailbreak-Tuning

1 Nov 2024 0:10 UTC
17 points
0 comments6 min readLW link
(far.ai)

Pac­ing Out­side the Box: RNNs Learn to Plan in Sokoban

25 Jul 2024 22:00 UTC
59 points
8 comments2 min readLW link
(arxiv.org)

Does ro­bust­ness im­prove with scale?

25 Jul 2024 20:55 UTC
14 points
0 comments1 min readLW link
(far.ai)

Beyond the Board: Ex­plor­ing AI Ro­bust­ness Through Go

AdamGleave19 Jun 2024 16:40 UTC
41 points
2 comments1 min readLW link
(far.ai)

More peo­ple get­ting into AI safety should do a PhD

AdamGleave14 Mar 2024 22:14 UTC
59 points
24 comments12 min readLW link
(gleave.me)

2023 Align­ment Re­search Up­dates from FAR AI

4 Dec 2023 22:32 UTC
18 points
0 comments8 min readLW link
(far.ai)

What’s new at FAR AI

4 Dec 2023 21:18 UTC
41 points
0 comments5 min readLW link
(far.ai)

Even Su­per­hu­man Go AIs Have Sur­pris­ing Failure Modes

20 Jul 2023 17:31 UTC
129 points
22 comments10 min readLW link
(far.ai)