Re­spon­si­ble Scal­ing Poli­cies Are Risk Man­age­ment Done Wrong

simeon_c25 Oct 2023 23:46 UTC
122 points
35 comments22 min readLW link1 review
(www.navigatingrisks.ai)

[Question] What are the long-term out­comes for Bit­coin and other cryp­tocur­ren­cies?

Auspicious25 Oct 2023 21:12 UTC
−4 points
1 comment1 min readLW link

AI as a sci­ence, and three ob­sta­cles to al­ign­ment strategies

So8res25 Oct 2023 21:00 UTC
185 points
80 comments11 min readLW link

My hopes for al­ign­ment: Sin­gu­lar learn­ing the­ory and whole brain emulation

Garrett Baker25 Oct 2023 18:31 UTC
61 points
5 comments12 min readLW link

[Question] Ly­ing to chess play­ers for alignment

Zane25 Oct 2023 17:47 UTC
96 points
54 comments1 min readLW link

An­thropic, Google, Microsoft & OpenAI an­nounce Ex­ec­u­tive Direc­tor of the Fron­tier Model Fo­rum & over $10 mil­lion for a new AI Safety Fund

Zach Stein-Perlman25 Oct 2023 15:20 UTC
31 points
8 comments4 min readLW link
(www.frontiermodelforum.org)

“The Eco­nomics of Time Travel”—call for re­view­ers (Seeds of Science)

rogersbacon25 Oct 2023 15:13 UTC
4 points
2 comments1 min readLW link

Com­po­si­tional prefer­ence mod­els for al­ign­ing LMs

Tomek Korbak25 Oct 2023 12:17 UTC
18 points
2 comments5 min readLW link

[Question] Should the US House of Rep­re­sen­ta­tives adopt rank choice vot­ing for lead­er­ship po­si­tions?

jmh25 Oct 2023 11:16 UTC
16 points
6 comments1 min readLW link

Re­searchers be­lieve they have found a way for artists to fight back against AI style capture

vernamcipher25 Oct 2023 10:54 UTC
3 points
1 comment1 min readLW link
(finance.yahoo.com)

Why We Disagree

zulupineapple25 Oct 2023 10:50 UTC
7 points
2 comments2 min readLW link

Beyond the Data: Why aid to poor doesn’t work

Lyrongolem25 Oct 2023 5:03 UTC
2 points
31 comments12 min readLW link

An­nounc­ing Epoch’s newly ex­panded Pa­ram­e­ters, Com­pute and Data Trends in Ma­chine Learn­ing database

25 Oct 2023 2:55 UTC
18 points
0 comments1 min readLW link
(epochai.org)

What is a Se­quenc­ing Read?

jefftk25 Oct 2023 2:10 UTC
17 points
2 comments2 min readLW link
(www.jefftk.com)

Ver­ifi­able pri­vate ex­e­cu­tion of ma­chine learn­ing mod­els with Risc0?

mako yass25 Oct 2023 0:44 UTC
30 points
2 comments2 min readLW link

[Question] How to Re­solve Fore­casts With No Cen­tral Author­ity?

Nathan Young25 Oct 2023 0:28 UTC
17 points
6 comments1 min readLW link

Thoughts on re­spon­si­ble scal­ing poli­cies and regulation

paulfchristiano24 Oct 2023 22:21 UTC
220 points
33 comments6 min readLW link

The Screen­play Method

Yeshua God24 Oct 2023 17:41 UTC
−15 points
0 comments25 min readLW link

Blunt Razor

fryolysis24 Oct 2023 17:27 UTC
3 points
0 comments2 min readLW link

Hal­loween Problem

Saint Blasphemer24 Oct 2023 16:46 UTC
−10 points
1 comment1 min readLW link

Who is Harry Pot­ter? Some pre­dic­tions.

Donald Hobson24 Oct 2023 16:14 UTC
23 points
7 comments2 min readLW link

Book Re­view: Go­ing Infinite

Zvi24 Oct 2023 15:00 UTC
242 points
113 comments97 min readLW link1 review
(thezvi.wordpress.com)

[In­ter­view w/​ Quintin Pope] Evolu­tion, val­ues, and AI Safety

fowlertm24 Oct 2023 13:53 UTC
11 points
0 comments1 min readLW link

Ly­ing is Cowardice, not Strategy

24 Oct 2023 13:24 UTC
31 points
73 comments5 min readLW link
(cognition.cafe)

[Question] Any­one Else Us­ing Brilli­ant?

Sable24 Oct 2023 12:12 UTC
19 points
0 comments1 min readLW link

An­nounc­ing #AISum­mitTalks fea­tur­ing Pro­fes­sor Stu­art Rus­sell and many others

otto.barten24 Oct 2023 10:11 UTC
17 points
1 comment1 min readLW link

Linkpost: A Post Mortem on the Gino Case

Linch24 Oct 2023 6:50 UTC
89 points
7 comments2 min readLW link
(www.theorgplumber.com)

South Bay SSC Meetup, San Jose, Novem­ber 5th.

David Friedman24 Oct 2023 4:50 UTC
2 points
1 comment1 min readLW link

AI Pause Will Likely Back­fire (Guest Post)

jsteinhardt24 Oct 2023 4:30 UTC
47 points
6 comments15 min readLW link
(bounded-regret.ghost.io)

Hu­man wanting

TsviBT24 Oct 2023 1:05 UTC
53 points
1 comment10 min readLW link

Towards Un­der­stand­ing Sy­co­phancy in Lan­guage Models

24 Oct 2023 0:30 UTC
66 points
0 comments2 min readLW link
(arxiv.org)

Man­i­fold Hal­loween Hackathon

Austin Chen23 Oct 2023 22:47 UTC
8 points
0 comments1 min readLW link

Open Source Repli­ca­tion & Com­men­tary on An­thropic’s Dic­tionary Learn­ing Paper

Neel Nanda23 Oct 2023 22:38 UTC
93 points
12 comments9 min readLW link

The Shut­down Prob­lem: An AI Eng­ineer­ing Puz­zle for De­ci­sion Theorists

EJT23 Oct 2023 21:00 UTC
79 points
22 comments1 min readLW link
(philpapers.org)

AI Align­ment [In­cre­men­tal Progress Units] this Week (10/​22/​23)

Logan Zoellner23 Oct 2023 20:32 UTC
22 points
0 comments6 min readLW link
(midwitalignment.substack.com)

z is not the cause of x

hrbigelow23 Oct 2023 17:43 UTC
6 points
2 comments9 min readLW link

Some of my pre­dictable up­dates on AI

Aaron_Scher23 Oct 2023 17:24 UTC
32 points
8 comments9 min readLW link

Pro­gram­matic back­doors: DNNs can use SGD to run ar­bi­trary state­ful computation

23 Oct 2023 16:37 UTC
107 points
3 comments8 min readLW link

Ma­chine Un­learn­ing Eval­u­a­tions as In­ter­pretabil­ity Benchmarks

23 Oct 2023 16:33 UTC
33 points
2 comments11 min readLW link

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

23 Oct 2023 14:11 UTC
20 points
2 comments5 min readLW link
(far.ai)

Con­tra Dance Dialect Survey

jefftk23 Oct 2023 13:40 UTC
11 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] Which LessWrongers are (as­piring) YouTu­bers?

Mati_Roy23 Oct 2023 13:21 UTC
22 points
13 comments1 min readLW link

[Question] What is an “anti-Oc­camian prior”?

Zane23 Oct 2023 2:26 UTC
35 points
22 comments1 min readLW link

AI Safety is Drop­ping the Ball on Clown Attacks

trevor22 Oct 2023 20:09 UTC
64 points
78 comments34 min readLW link

The Drown­ing Child

Tomás B.22 Oct 2023 16:39 UTC
25 points
8 comments1 min readLW link

An­nounc­ing Timaeus

22 Oct 2023 11:59 UTC
187 points
15 comments4 min readLW link

Into AI Safety—Epi­sode 0

jacobhaimes22 Oct 2023 3:30 UTC
5 points
1 comment1 min readLW link
(into-ai-safety.github.io)

Thoughts On (Solv­ing) Deep Deception

Jozdien21 Oct 2023 22:40 UTC
69 points
4 comments6 min readLW link

Best effort beliefs

Adam Zerner21 Oct 2023 22:05 UTC
14 points
9 comments4 min readLW link

How toy mod­els of on­tol­ogy changes can be misleading

Stuart_Armstrong21 Oct 2023 21:13 UTC
42 points
0 comments2 min readLW link