RSS

Buck

Karma: 10,700

CEO at Redwood Research.

AI safety is a highly collaborative field—almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I’m saying this here because it would feel repetitive to say “these ideas were developed in collaboration with various people” in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Sub­ver­sion Strat­egy Eval: Can lan­guage mod­els state­lessly strate­gize to sub­vert con­trol pro­to­cols?

Mar 24, 2025, 5:55 PM
30 points
0 comments8 min readLW link

Some ar­ti­cles in “In­ter­na­tional Se­cu­rity” that I enjoyed

BuckJan 31, 2025, 4:23 PM
129 points
10 comments4 min readLW link

A sketch of an AI con­trol safety case

Jan 30, 2025, 5:28 PM
60 points
0 comments5 min readLW link