The nihilism of NeurIPS

charlieoneill20 Dec 2024 23:58 UTC
100 points
7 comments4 min readLW link

Fore­cast 2025 With Vox’s Fu­ture Perfect Team — $2,500 Prize Pool

ChristianWilliams20 Dec 2024 23:00 UTC
19 points
0 comments1 min readLW link
(www.metaculus.com)

[Question] How do we quan­tify non-philan­thropic con­tri­bu­tions from Buffet and Soros?

Philosophistry20 Dec 2024 22:50 UTC
3 points
0 comments1 min readLW link

An­thropic lead­er­ship conversation

Zach Stein-Perlman20 Dec 2024 22:00 UTC
62 points
16 comments6 min readLW link
(www.youtube.com)

As We May Align

Gilbert C20 Dec 2024 19:02 UTC
−1 points
0 comments6 min readLW link

o3 is not be­ing re­leased to the pub­lic. First they are only giv­ing ac­cess to ex­ter­nal safety testers. You can ap­ply to get early ac­cess to do safety testing

KatWoods20 Dec 2024 18:30 UTC
16 points
0 comments1 min readLW link
(openai.com)

o3

Zach Stein-Perlman20 Dec 2024 18:30 UTC
151 points
148 comments1 min readLW link

What Goes Without Saying

sarahconstantin20 Dec 2024 18:00 UTC
257 points
15 comments5 min readLW link
(sarahconstantin.substack.com)

Ret­ro­spec­tive: PIBBSS Fel­low­ship 2024

20 Dec 2024 15:55 UTC
64 points
1 comment4 min readLW link

Com­po­si­tion­al­ity and Am­bi­guity: La­tent Co-oc­cur­rence and In­ter­pretable Subspaces

20 Dec 2024 15:16 UTC
28 points
0 comments37 min readLW link

🇫🇷 An­nounc­ing CeSIA: The French Cen­ter for AI Safety

Charbel-Raphaël20 Dec 2024 14:17 UTC
74 points
0 comments8 min readLW link

Moder­ately Skep­ti­cal of “Risks of Mir­ror Biol­ogy”

Davidmanheim20 Dec 2024 12:57 UTC
32 points
3 comments9 min readLW link
(substack.com)

Do­ing Sport Reli­ably via Dancing

Johannes C. Mayer20 Dec 2024 12:06 UTC
16 points
0 comments2 min readLW link

You can val­idly be seen and val­i­dated by a chatbot

Kaj_Sotala20 Dec 2024 12:00 UTC
28 points
3 comments8 min readLW link
(kajsotala.fi)

What I ex­pected from this site: A LessWrong review

Nathan Young20 Dec 2024 11:27 UTC
30 points
5 comments3 min readLW link
(nathanpmyoung.substack.com)

Al­go­phobes and Al­go­v­er­ses: The New Ene­mies of Progress

Wenitte Apiou20 Dec 2024 10:01 UTC
−24 points
0 comments2 min readLW link

“Align­ment Fak­ing” frame is some­what fake

Jan_Kulveit20 Dec 2024 9:51 UTC
142 points
13 comments6 min readLW link

No In­ter­nally-Crispy Mac and Cheese

jefftk20 Dec 2024 3:20 UTC
12 points
5 comments1 min readLW link
(www.jefftk.com)

Ap­ply to be a TA for TARA

yanni kyriacos20 Dec 2024 2:25 UTC
10 points
0 comments1 min readLW link

An­nounc­ing the Q1 2025 Long-Term Fu­ture Fund grant round

20 Dec 2024 2:20 UTC
32 points
0 comments2 min readLW link
(forum.effectivealtruism.org)

Re­minder: AI Safety is Also a Be­hav­ioral Eco­nomics Problem

zoop20 Dec 2024 1:40 UTC
2 points
0 comments1 min readLW link

Re­place­able Ax­ioms give more cre­dence than ir­re­place­able axioms

Yoav Ravid20 Dec 2024 0:51 UTC
6 points
2 comments2 min readLW link

Mid-Gen­er­a­tion Self-Cor­rec­tion: A Sim­ple Tool for Safer AI

MrThink19 Dec 2024 23:41 UTC
13 points
0 comments1 min readLW link

Ap­ply now to SPAR!

agucova19 Dec 2024 22:29 UTC
11 points
0 comments1 min readLW link

How to repli­cate and ex­tend our al­ign­ment fak­ing demo

Fabien Roger19 Dec 2024 21:44 UTC
100 points
2 comments2 min readLW link
(alignment.anthropic.com)

The Ge­n­e­sis Project

aproteinengine19 Dec 2024 21:26 UTC
15 points
0 comments1 min readLW link
(genesis-embodied-ai.github.io)

Mea­sur­ing whether AIs can state­lessly strate­gize to sub­vert se­cu­rity measures

19 Dec 2024 21:25 UTC
60 points
0 comments11 min readLW link

Claude’s Con­sti­tu­tional Con­se­quen­tial­ism?

1a3orn19 Dec 2024 19:53 UTC
43 points
6 comments6 min readLW link

A short cri­tique of Omo­hun­dro’s “Ba­sic AI Drives”

Soumyadeep Bose19 Dec 2024 19:19 UTC
6 points
0 comments4 min readLW link

When Is In­surance Worth It?

kqr19 Dec 2024 19:07 UTC
157 points
64 comments4 min readLW link
(entropicthoughts.com)

Launch­ing Third Opinion: Anony­mous Ex­pert Con­sul­ta­tion for AI Professionals

karl19 Dec 2024 19:06 UTC
2 points
0 comments5 min readLW link

Us­ing LLM Search to Aug­ment (Math­e­mat­ics) Research

kaleb19 Dec 2024 18:59 UTC
5 points
0 comments6 min readLW link

A progress policy agenda

jasoncrawford19 Dec 2024 18:42 UTC
30 points
1 comment5 min readLW link
(newsletter.rootsofprogress.org)

build­ing char­ac­ter isn’t about willpower or sacrifice

dhruvmethi19 Dec 2024 18:17 UTC
1 point
0 comments4 min readLW link

AISN #45: Cen­ter for AI Safety 2024 Year in Review

19 Dec 2024 18:15 UTC
13 points
0 comments4 min readLW link
(newsletter.safe.ai)

Learn­ing Multi-Level Fea­tures with Ma­tryoshka SAEs

19 Dec 2024 15:59 UTC
25 points
4 comments11 min readLW link

Sim­ple Stegano­graphic Com­pu­ta­tion Eval—gpt-4o and gem­ini-exp-1206 can’t solve it yet

Filip Sondej19 Dec 2024 15:47 UTC
12 points
2 comments3 min readLW link

AI #95: o1 Joins the API

Zvi19 Dec 2024 15:10 UTC
58 points
1 comment41 min readLW link
(thezvi.wordpress.com)

Ex­ec­u­tive Direc­tor for AIS Brus­sels—Ex­pres­sion of interest

19 Dec 2024 9:19 UTC
1 point
0 comments4 min readLW link

Ex­ec­u­tive Direc­tor for AIS France—Ex­pres­sion of interest

19 Dec 2024 8:14 UTC
9 points
0 comments3 min readLW link

Inescapably Value-Laden Ex­pe­rience—a Catchy Term I Made Up to Make Mo­ral­ity Rationalisable

James Stephen Brown19 Dec 2024 4:45 UTC
5 points
0 comments2 min readLW link
(nonzerosum.games)

I’m Writ­ing a Book About Liberalism

Yoav Ravid19 Dec 2024 0:13 UTC
15 points
6 comments2 min readLW link

A Solu­tion for AGI/​ASI Safety

Weibing Wang18 Dec 2024 19:44 UTC
50 points
29 comments1 min readLW link

Takes on “Align­ment Fak­ing in Large Lan­guage Models”

Joe Carlsmith18 Dec 2024 18:22 UTC
102 points
8 comments62 min readLW link

A Mat­ter of Taste

Zvi18 Dec 2024 17:50 UTC
36 points
4 comments11 min readLW link
(thezvi.wordpress.com)

Are we a differ­ent per­son each time? A sim­ple ar­gu­ment for the im­per­ma­nence of our identity

l4mp18 Dec 2024 17:21 UTC
−4 points
5 comments1 min readLW link

Align­ment Fak­ing in Large Lan­guage Models

18 Dec 2024 17:19 UTC
436 points
53 comments10 min readLW link

Can o1-pre­view find ma­jor mis­takes amongst 59 NeurIPS ’24 MLSB pa­pers?

Abhishaike Mahajan18 Dec 2024 14:21 UTC
18 points
0 comments6 min readLW link
(www.owlposting.com)

Walk­ing Sue

Matthew McRedmond18 Dec 2024 13:19 UTC
2 points
5 comments8 min readLW link

What con­clu­sions can be drawn from a sin­gle ob­ser­va­tion about wealth in ten­nis?

Trevor Cappallo18 Dec 2024 9:55 UTC
8 points
3 comments2 min readLW link