RSS

charlie_griffin

Karma: 327

Sub­ver­sion Strat­egy Eval: Can lan­guage mod­els state­lessly strate­gize to sub­vert con­trol pro­to­cols?

Mar 24, 2025, 5:55 PM
30 points
0 comments8 min readLW link

LASR Labs Spring 2025 ap­pli­ca­tions are open!

Oct 4, 2024, 1:44 PM
38 points
0 comments4 min readLW link

Games for AI Control

Jul 11, 2024, 6:40 PM
43 points
0 comments5 min readLW link

Ap­ply to LASR Labs: a Lon­don-based tech­ni­cal AI safety re­search programme

Apr 9, 2024, 5:34 PM
45 points
1 comment3 min readLW link

Sce­nario Fore­cast­ing Work­shop: Ma­te­ri­als and Learnings

Mar 8, 2024, 2:30 AM
50 points
3 comments2 min readLW link

Five pro­jects from AI Safety Hub Labs 2023

charlie_griffinNov 8, 2023, 7:19 PM
47 points
1 comment6 min readLW link
(www.aisafetyhub.org)

Good­hart’s Law in Re­in­force­ment Learning

Oct 16, 2023, 12:54 AM
126 points
22 comments7 min readLW link