RSS

Oliver Sourbut

Karma: 920

I’m particularly interested in sustainable collaboration and the long-term future of value. I’d love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.

I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read—let me know your suggestions! In no particular order, here are some I’ve enjoyed recently

Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites

People who’ve got to know me only recently are sometimes surprised to learn that I’m a pretty handy trumpeter and hornist.

De­cep­tive Align­ment and Homuncularity

Jan 16, 2025, 1:55 PM
25 points
12 comments22 min readLW link

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

Aug 3, 2024, 10:16 AM
8 points
0 comments14 min readLW link
(www.oliversourbut.net)

[Question] Ter­minol­ogy: <some­thing>-ware for ML?

Oliver SourbutJan 3, 2024, 11:42 AM
17 points
27 comments1 min readLW link

Align­ment, con­flict, powerseeking

Oliver SourbutNov 22, 2023, 9:47 AM
6 points
1 comment1 min readLW link

Care­less talk on US-China AI com­pe­ti­tion? (and crit­i­cism of CAIS cov­er­age)

Oliver SourbutSep 20, 2023, 12:46 PM
16 points
3 comments10 min readLW link3 reviews
(www.oliversourbut.net)

In­vad­ing Aus­tralia (End­less Former­lies Most Beau­tiful, or What I Learned On My Holi­day)

Oliver SourbutSep 8, 2023, 3:33 PM
12 points
1 comment8 min readLW link
(www.oliversourbut.net)

Hert­ford, Sour­but (ra­tio­nal­ity les­sons from Univer­sity Challenge)

Oliver SourbutSep 4, 2023, 6:44 PM
28 points
7 comments14 min readLW link
(www.oliversourbut.net)

Un-un­plug­ga­bil­ity—can’t we just un­plug it?

Oliver SourbutMay 15, 2023, 1:23 PM
26 points
10 comments12 min readLW link
(www.oliversourbut.net)

Oliver Sour­but’s Shortform

Oliver SourbutJul 14, 2022, 3:39 PM
4 points
1 commentLW link

De­liber­a­tion Every­where: Sim­ple Examples

Oliver SourbutJun 27, 2022, 5:26 PM
27 points
3 comments15 min readLW link

De­liber­a­tion, Re­ac­tions, and Con­trol: Ten­ta­tive Defi­ni­tions and a Res­tate­ment of In­stru­men­tal Convergence

Oliver SourbutJun 27, 2022, 5:25 PM
12 points
0 comments11 min readLW link

Fea­ture re­quest: vot­ing but­tons at the bot­tom?

Oliver SourbutJun 24, 2022, 2:41 PM
70 points
12 comments1 min readLW link

Break­ing Down Goal-Directed Behaviour

Oliver SourbutJun 16, 2022, 6:45 PM
11 points
1 comment2 min readLW link

You Only Get One Shot: an In­tu­ition Pump for Embed­ded Agency

Oliver SourbutJun 9, 2022, 9:38 PM
24 points
4 comments2 min readLW link

Gato’s Gen­er­al­i­sa­tion: Pre­dic­tions and Ex­per­i­ments I’d Like to See

Oliver SourbutMay 18, 2022, 7:15 AM
43 points
3 comments10 min readLW link

Con­di­tions for math­e­mat­i­cal equiv­alence of Stochas­tic Gra­di­ent Des­cent and Nat­u­ral Selection

Oliver SourbutMay 9, 2022, 9:38 PM
70 points
19 comments8 min readLW link1 review
(www.oliversourbut.net)

Mo­ti­va­tions, Nat­u­ral Selec­tion, and Cur­ricu­lum Engineering

Oliver SourbutDec 16, 2021, 1:07 AM
16 points
0 comments42 min readLW link

Some real ex­am­ples of gra­di­ent hacking

Oliver SourbutNov 22, 2021, 12:11 AM
15 points
8 comments2 min readLW link