Oliver Sourbut

Karma: 920

Autonomous Systems @ UK AI Safety Institute (AISI)
DPhil AI Safety @ Oxford (Hertford college, CS dept, AIMS CDT)
Former senior data scientist and software engineer + SERI MATS

I’m particularly interested in sustainable collaboration and the long-term future of value. I’d love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.

I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read—let me know your suggestions! In no particular order, here are some I’ve enjoyed recently

Ord—The Precipice
Pearl—The Book of Why
Bostrom—Superintelligence
McCall Smith—The No. 1 Ladies’ Detective Agency (and series)
Melville—Moby-Dick
Abelson & Sussman—Structure and Interpretation of Computer Programs
Stross—Accelerando
Graeme—The Rosie Project (and trilogy)

Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites

Hanabi (can’t recommend enough; try it out!)
Pandemic (ironic at time of writing...)
Dungeons and Dragons (I DM a bit and it keeps me on my creative toes)
Overcooked (my partner and I enjoy the foody themes and frantic realtime coordination playing this)

People who’ve got to know me only recently are sometimes surprised to learn that I’m a pretty handy trumpeter and hornist.

Deceptive Alignment and Homuncularity

Oliver Sourbut and TurnTrout

Jan 16, 2025, 1:55 PM

25 points

12 comments22 min readLW link

Cooperation and Alignment in Delegation Games: You Need Both!

Oliver Sourbut, Lewis Hammond and HarrietW

Aug 3, 2024, 10:16 AM

8 points

0 comments14 min readLW link

(www.oliversourbut.net)

[Question] Terminology: <something>-ware for ML?

Oliver SourbutJan 3, 2024, 11:42 AM

17 points

27 comments1 min readLW link

Alignment, conflict, powerseeking

Oliver SourbutNov 22, 2023, 9:47 AM

6 points

1 comment1 min readLW link

Careless talk on US-China AI competition? (and criticism of CAIS coverage)

Oliver SourbutSep 20, 2023, 12:46 PM

16 points

3 comments10 min readLW link 3 reviews

(www.oliversourbut.net)

Invading Australia (Endless Formerlies Most Beautiful, or What I Learned On My Holiday)

Oliver SourbutSep 8, 2023, 3:33 PM

12 points

1 comment8 min readLW link

(www.oliversourbut.net)

Hertford, Sourbut (rationality lessons from University Challenge)

Oliver SourbutSep 4, 2023, 6:44 PM

28 points

7 comments14 min readLW link

(www.oliversourbut.net)

Un-unpluggability—can’t we just unplug it?

Oliver SourbutMay 15, 2023, 1:23 PM

26 points

10 comments12 min readLW link

(www.oliversourbut.net)

Oliver Sourbut’s Shortform

Oliver SourbutJul 14, 2022, 3:39 PM

4 points

1 comment LW link

Deliberation Everywhere: Simple Examples

Oliver SourbutJun 27, 2022, 5:26 PM

27 points

3 comments15 min readLW link

Deliberation, Reactions, and Control: Tentative Definitions and a Restatement of Instrumental Convergence

Oliver SourbutJun 27, 2022, 5:25 PM

12 points

0 comments11 min readLW link

Feature request: voting buttons at the bottom?

Oliver SourbutJun 24, 2022, 2:41 PM

70 points

12 comments1 min readLW link

Breaking Down Goal-Directed Behaviour

Oliver SourbutJun 16, 2022, 6:45 PM

11 points

1 comment2 min readLW link

You Only Get One Shot: an Intuition Pump for Embedded Agency

Oliver SourbutJun 9, 2022, 9:38 PM

24 points

4 comments2 min readLW link

Gato’s Generalisation: Predictions and Experiments I’d Like to See

Oliver SourbutMay 18, 2022, 7:15 AM

43 points

3 comments10 min readLW link

Conditions for mathematical equivalence of Stochastic Gradient Descent and Natural Selection

Oliver SourbutMay 9, 2022, 9:38 PM

70 points

19 comments8 min readLW link 1 review

(www.oliversourbut.net)

Motivations, Natural Selection, and Curriculum Engineering

Oliver SourbutDec 16, 2021, 1:07 AM

16 points

0 comments42 min readLW link

Some real examples of gradient hacking

Oliver SourbutNov 22, 2021, 12:11 AM

15 points

8 comments2 min readLW link