Thoughts on re­spon­si­ble scal­ing poli­cies and regulation

paulfchristiano24 Oct 2023 22:21 UTC
220 points
33 comments6 min readLW link

The Screen­play Method

Yeshua God24 Oct 2023 17:41 UTC
−15 points
0 comments25 min readLW link

Blunt Razor

fryolysis24 Oct 2023 17:27 UTC
3 points
0 comments2 min readLW link

Hal­loween Problem

Saint Blasphemer24 Oct 2023 16:46 UTC
−10 points
1 comment1 min readLW link

Who is Harry Pot­ter? Some pre­dic­tions.

Donald Hobson24 Oct 2023 16:14 UTC
23 points
7 comments2 min readLW link

Book Re­view: Go­ing Infinite

Zvi24 Oct 2023 15:00 UTC
242 points
113 comments97 min readLW link1 review
(thezvi.wordpress.com)

[In­ter­view w/​ Quintin Pope] Evolu­tion, val­ues, and AI Safety

fowlertm24 Oct 2023 13:53 UTC
11 points
0 comments1 min readLW link

Ly­ing is Cowardice, not Strategy

24 Oct 2023 13:24 UTC
31 points
73 comments5 min readLW link
(cognition.cafe)

[Question] Any­one Else Us­ing Brilli­ant?

Sable24 Oct 2023 12:12 UTC
19 points
0 comments1 min readLW link

An­nounc­ing #AISum­mitTalks fea­tur­ing Pro­fes­sor Stu­art Rus­sell and many others

otto.barten24 Oct 2023 10:11 UTC
17 points
1 comment1 min readLW link

Linkpost: A Post Mortem on the Gino Case

Linch24 Oct 2023 6:50 UTC
89 points
7 comments2 min readLW link
(www.theorgplumber.com)

South Bay SSC Meetup, San Jose, Novem­ber 5th.

David Friedman24 Oct 2023 4:50 UTC
2 points
1 comment1 min readLW link

AI Pause Will Likely Back­fire (Guest Post)

jsteinhardt24 Oct 2023 4:30 UTC
47 points
6 comments15 min readLW link
(bounded-regret.ghost.io)

Hu­man wanting

TsviBT24 Oct 2023 1:05 UTC
53 points
1 comment10 min readLW link

Towards Un­der­stand­ing Sy­co­phancy in Lan­guage Models

24 Oct 2023 0:30 UTC
66 points
0 comments2 min readLW link
(arxiv.org)

Man­i­fold Hal­loween Hackathon

Austin Chen23 Oct 2023 22:47 UTC
8 points
0 comments1 min readLW link

Open Source Repli­ca­tion & Com­men­tary on An­thropic’s Dic­tionary Learn­ing Paper

Neel Nanda23 Oct 2023 22:38 UTC
93 points
12 comments9 min readLW link

The Shut­down Prob­lem: An AI Eng­ineer­ing Puz­zle for De­ci­sion Theorists

EJT23 Oct 2023 21:00 UTC
79 points
22 comments1 min readLW link
(philpapers.org)

AI Align­ment [In­cre­men­tal Progress Units] this Week (10/​22/​23)

Logan Zoellner23 Oct 2023 20:32 UTC
22 points
0 comments6 min readLW link
(midwitalignment.substack.com)

z is not the cause of x

hrbigelow23 Oct 2023 17:43 UTC
6 points
2 comments9 min readLW link

Some of my pre­dictable up­dates on AI

Aaron_Scher23 Oct 2023 17:24 UTC
32 points
8 comments9 min readLW link

Pro­gram­matic back­doors: DNNs can use SGD to run ar­bi­trary state­ful computation

23 Oct 2023 16:37 UTC
107 points
3 comments8 min readLW link

Ma­chine Un­learn­ing Eval­u­a­tions as In­ter­pretabil­ity Benchmarks

23 Oct 2023 16:33 UTC
33 points
2 comments11 min readLW link

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

23 Oct 2023 14:11 UTC
20 points
2 comments5 min readLW link
(far.ai)

Con­tra Dance Dialect Survey

jefftk23 Oct 2023 13:40 UTC
11 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] Which LessWrongers are (as­piring) YouTu­bers?

Mati_Roy23 Oct 2023 13:21 UTC
22 points
13 comments1 min readLW link

[Question] What is an “anti-Oc­camian prior”?

Zane23 Oct 2023 2:26 UTC
35 points
22 comments1 min readLW link

AI Safety is Drop­ping the Ball on Clown Attacks

trevor22 Oct 2023 20:09 UTC
65 points
78 comments34 min readLW link

The Drown­ing Child

Tomás B.22 Oct 2023 16:39 UTC
25 points
8 comments1 min readLW link

An­nounc­ing Timaeus

22 Oct 2023 11:59 UTC
187 points
15 comments4 min readLW link

Into AI Safety—Epi­sode 0

jacobhaimes22 Oct 2023 3:30 UTC
5 points
1 comment1 min readLW link
(into-ai-safety.github.io)

Thoughts On (Solv­ing) Deep Deception

Jozdien21 Oct 2023 22:40 UTC
69 points
4 comments6 min readLW link

Best effort beliefs

Adam Zerner21 Oct 2023 22:05 UTC
14 points
9 comments4 min readLW link

How toy mod­els of on­tol­ogy changes can be misleading

Stuart_Armstrong21 Oct 2023 21:13 UTC
42 points
0 comments2 min readLW link

Soups as Spreads

jefftk21 Oct 2023 20:30 UTC
22 points
0 comments1 min readLW link
(www.jefftk.com)

Which COVID booster to get?

Sameerishere21 Oct 2023 19:43 UTC
8 points
0 comments2 min readLW link

Align­ment Im­pli­ca­tions of LLM Suc­cesses: a De­bate in One Act

Zack_M_Davis21 Oct 2023 15:22 UTC
247 points
51 comments13 min readLW link1 review

How to find a good mov­ing service

Ziyue Wang21 Oct 2023 4:59 UTC
8 points
0 comments3 min readLW link

Ap­ply for MATS Win­ter 2023-24!

21 Oct 2023 2:27 UTC
104 points
6 comments5 min readLW link

[Question] Can we iso­late neu­rons that rec­og­nize fea­tures vs. those which have some other role?

Joshua Clancy21 Oct 2023 0:30 UTC
4 points
2 comments3 min readLW link

Mud­dling Along Is More Likely Than Dystopia

Jeffrey Heninger20 Oct 2023 21:25 UTC
83 points
10 comments8 min readLW link

What’s Hard About The Shut­down Problem

johnswentworth20 Oct 2023 21:13 UTC
98 points
33 comments4 min readLW link

Holly El­more and Rob Miles di­alogue on AI Safety Advocacy

20 Oct 2023 21:04 UTC
162 points
30 comments27 min readLW link

TOMORROW: the largest AI Safety protest ever!

Holly_Elmore20 Oct 2023 18:15 UTC
105 points
26 comments2 min readLW link

The Overkill Con­spir­acy Hypothesis

ymeskhout20 Oct 2023 16:51 UTC
26 points
8 comments7 min readLW link

I Would Have Solved Align­ment, But I Was Wor­ried That Would Ad­vance Timelines

307th20 Oct 2023 16:37 UTC
119 points
33 comments9 min readLW link

In­ter­nal Tar­get In­for­ma­tion for AI Oversight

Paul Colognese20 Oct 2023 14:53 UTC
15 points
0 comments5 min readLW link

On the proper date for sols­tice celebrations

jchan20 Oct 2023 13:55 UTC
16 points
0 comments4 min readLW link

Are (at least some) Large Lan­guage Models Holo­graphic Me­mory Stores?

Bill Benzon20 Oct 2023 13:07 UTC
11 points
4 comments6 min readLW link

Mechanis­tic in­ter­pretabil­ity of LLM anal­ogy-making

Sergii20 Oct 2023 12:53 UTC
2 points
0 comments4 min readLW link
(grgv.xyz)