Buck

Karma: 11,616

CEO at Redwood Research.

AI safety is a highly collaborative field—almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I’m saying this here because it would feel repetitive to say “these ideas were developed in collaboration with various people” in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Fields that I reference when thinking about AI takeover prevention

BuckAug 13, 2024, 11:08 PM

144 points

16 comments10 min readLW link

(redwoodresearch.substack.com)

Games for AI Control

charlie_griffin and Buck

Jul 11, 2024, 6:40 PM

45 points

0 comments5 min readLW link

Scalable oversight as a quantitative rather than qualitative problem

BuckJul 6, 2024, 5:42 PM

86 points

11 comments3 min readLW link

Different senses in which two AIs can be “the same”

Vivek Hebbar and Buck

Jun 24, 2024, 3:16 AM

69 points

2 comments4 min readLW link

Access to powerful AI might make computer security radically easier

BuckJun 8, 2024, 6:00 AM

106 points

14 comments6 min readLW link

AI catastrophes and rogue deployments

BuckJun 3, 2024, 5:04 PM

120 points

16 comments8 min readLW link

Notes on control evaluations for safety cases

ryan_greenblatt, Buck and Fabien Roger

Feb 28, 2024, 4:15 PM

49 points

0 comments32 min readLW link

Toy models of AI control for concentrated catastrophe prevention

Fabien Roger and Buck

Feb 6, 2024, 1:38 AM

51 points

2 comments7 min readLW link

The case for ensuring that powerful AIs are controlled

ryan_greenblatt and Buck

Jan 24, 2024, 4:11 PM

276 points

73 comments28 min readLW link

Managing catastrophic misuse without robust AIs

ryan_greenblatt and Buck

Jan 16, 2024, 5:27 PM

63 points

17 comments11 min readLW link

Catching AIs red-handed

ryan_greenblatt and Buck

Jan 5, 2024, 5:43 PM

111 points

27 comments17 min readLW link

Measurement tampering detection as a special case of weak-to-strong generalization

ryan_greenblatt, Fabien Roger and Buck

Dec 23, 2023, 12:05 AM

57 points

10 comments4 min readLW link

Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem

Ansh Radhakrishnan, Buck, ryan_greenblatt and Fabien Roger

Dec 16, 2023, 5:49 AM

76 points

4 comments6 min readLW link 1 review

AI Control: Improving Safety Despite Intentional Subversion

Buck, Fabien Roger, ryan_greenblatt and Kshitij Sachan

Dec 13, 2023, 3:51 PM

236 points

24 comments10 min readLW link 4 reviews

How useful is mechanistic interpretability?

ryan_greenblatt, Neel Nanda, Buck and habryka

Dec 1, 2023, 2:54 AM

167 points

54 comments25 min readLW link

Untrusted smart models and trusted dumb models

BuckNov 4, 2023, 3:06 AM

87 points

17 comments6 min readLW link 1 review

Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation

Fabien Roger and Buck

Oct 23, 2023, 4:37 PM

107 points

3 comments8 min readLW link

Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy

Buck and ryan_greenblatt

Jul 26, 2023, 5:02 PM

100 points

19 comments1 min readLW link 1 review

A freshman year during the AI midgame: my approach to the next year

BuckApr 14, 2023, 12:38 AM

154 points

15 comments LW link 1 review

One-layer transformers aren’t equivalent to a set of skip-trigrams

BuckFeb 17, 2023, 5:26 PM

127 points

11 comments7 min readLW link