Man­i­fold Hal­loween Hackathon

Austin Chen23 Oct 2023 22:47 UTC
8 points
0 comments1 min readLW link

Open Source Repli­ca­tion & Com­men­tary on An­thropic’s Dic­tionary Learn­ing Paper

Neel Nanda23 Oct 2023 22:38 UTC
93 points
12 comments9 min readLW link

The Shut­down Prob­lem: An AI Eng­ineer­ing Puz­zle for De­ci­sion Theorists

EJT23 Oct 2023 21:00 UTC
79 points
22 comments1 min readLW link
(philpapers.org)

AI Align­ment [In­cre­men­tal Progress Units] this Week (10/​22/​23)

Logan Zoellner23 Oct 2023 20:32 UTC
22 points
0 comments6 min readLW link
(midwitalignment.substack.com)

z is not the cause of x

hrbigelow23 Oct 2023 17:43 UTC
6 points
2 comments9 min readLW link

Some of my pre­dictable up­dates on AI

Aaron_Scher23 Oct 2023 17:24 UTC
32 points
8 comments9 min readLW link

Pro­gram­matic back­doors: DNNs can use SGD to run ar­bi­trary state­ful computation

23 Oct 2023 16:37 UTC
107 points
3 comments8 min readLW link

Ma­chine Un­learn­ing Eval­u­a­tions as In­ter­pretabil­ity Benchmarks

23 Oct 2023 16:33 UTC
33 points
2 comments11 min readLW link

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

23 Oct 2023 14:11 UTC
20 points
2 comments5 min readLW link
(far.ai)

Con­tra Dance Dialect Survey

jefftk23 Oct 2023 13:40 UTC
11 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] Which LessWrongers are (as­piring) YouTu­bers?

Mati_Roy23 Oct 2023 13:21 UTC
22 points
13 comments1 min readLW link

[Question] What is an “anti-Oc­camian prior”?

Zane23 Oct 2023 2:26 UTC
35 points
22 comments1 min readLW link

AI Safety is Drop­ping the Ball on Clown Attacks

trevor22 Oct 2023 20:09 UTC
64 points
78 comments34 min readLW link

The Drown­ing Child

Tomás B.22 Oct 2023 16:39 UTC
25 points
8 comments1 min readLW link

An­nounc­ing Timaeus

22 Oct 2023 11:59 UTC
187 points
15 comments4 min readLW link

Into AI Safety—Epi­sode 0

jacobhaimes22 Oct 2023 3:30 UTC
5 points
1 comment1 min readLW link
(into-ai-safety.github.io)

Thoughts On (Solv­ing) Deep Deception

Jozdien21 Oct 2023 22:40 UTC
69 points
4 comments6 min readLW link

Best effort beliefs

Adam Zerner21 Oct 2023 22:05 UTC
14 points
9 comments4 min readLW link

How toy mod­els of on­tol­ogy changes can be misleading

Stuart_Armstrong21 Oct 2023 21:13 UTC
42 points
0 comments2 min readLW link

Soups as Spreads

jefftk21 Oct 2023 20:30 UTC
22 points
0 comments1 min readLW link
(www.jefftk.com)

Which COVID booster to get?

Sameerishere21 Oct 2023 19:43 UTC
8 points
0 comments2 min readLW link

Align­ment Im­pli­ca­tions of LLM Suc­cesses: a De­bate in One Act

Zack_M_Davis21 Oct 2023 15:22 UTC
247 points
51 comments13 min readLW link1 review

How to find a good mov­ing service

Ziyue Wang21 Oct 2023 4:59 UTC
8 points
0 comments3 min readLW link

Ap­ply for MATS Win­ter 2023-24!

21 Oct 2023 2:27 UTC
104 points
6 comments5 min readLW link

[Question] Can we iso­late neu­rons that rec­og­nize fea­tures vs. those which have some other role?

Joshua Clancy21 Oct 2023 0:30 UTC
4 points
2 comments3 min readLW link

Mud­dling Along Is More Likely Than Dystopia

Jeffrey Heninger20 Oct 2023 21:25 UTC
83 points
10 comments8 min readLW link

What’s Hard About The Shut­down Problem

johnswentworth20 Oct 2023 21:13 UTC
98 points
33 comments4 min readLW link

Holly El­more and Rob Miles di­alogue on AI Safety Advocacy

20 Oct 2023 21:04 UTC
162 points
30 comments27 min readLW link

TOMORROW: the largest AI Safety protest ever!

Holly_Elmore20 Oct 2023 18:15 UTC
105 points
26 comments2 min readLW link

The Overkill Con­spir­acy Hypothesis

ymeskhout20 Oct 2023 16:51 UTC
26 points
8 comments7 min readLW link

I Would Have Solved Align­ment, But I Was Wor­ried That Would Ad­vance Timelines

307th20 Oct 2023 16:37 UTC
119 points
33 comments9 min readLW link

In­ter­nal Tar­get In­for­ma­tion for AI Oversight

Paul Colognese20 Oct 2023 14:53 UTC
15 points
0 comments5 min readLW link

On the proper date for sols­tice celebrations

jchan20 Oct 2023 13:55 UTC
16 points
0 comments4 min readLW link

Are (at least some) Large Lan­guage Models Holo­graphic Me­mory Stores?

Bill Benzon20 Oct 2023 13:07 UTC
11 points
4 comments6 min readLW link

Mechanis­tic in­ter­pretabil­ity of LLM anal­ogy-making

Sergii20 Oct 2023 12:53 UTC
2 points
0 comments4 min readLW link
(grgv.xyz)

How To So­cial­ize With Psy­cho(lo­gist)s

Sable20 Oct 2023 11:33 UTC
37 points
11 comments3 min readLW link
(affablyevil.substack.com)

Re­veal­ing In­ten­tion­al­ity In Lan­guage Models Through AdaVAE Guided Sampling

jdp20 Oct 2023 7:32 UTC
119 points
15 comments22 min readLW link

Fea­tures and Ad­ver­saries in MemoryDT

20 Oct 2023 7:32 UTC
31 points
6 comments25 min readLW link

AI Safety Hub Ser­bia Soft Launch

DusanDNesic20 Oct 2023 7:11 UTC
65 points
1 comment3 min readLW link
(forum.effectivealtruism.org)

An­nounc­ing new round of “Key Phenom­ena in AI Risk” Read­ing Group

20 Oct 2023 7:11 UTC
15 points
2 comments1 min readLW link

Un­pack­ing the dy­nam­ics of AGI con­flict that sug­gest the ne­ces­sity of a premp­tive pivotal act

Eli Tyre20 Oct 2023 6:48 UTC
61 points
2 comments8 min readLW link

Geno­cide isn’t Decolonization

robotelvis20 Oct 2023 4:14 UTC
33 points
19 comments5 min readLW link
(messyprogress.substack.com)

Try­ing to un­der­stand John Went­worth’s re­search agenda

20 Oct 2023 0:05 UTC
92 points
13 comments12 min readLW link

Boost your pro­duc­tivity, hap­piness and health with this one weird trick

ajc58619 Oct 2023 23:30 UTC
9 points
9 comments1 min readLW link

A Good Ex­pla­na­tion of Differ­en­tial Gears

Johannes C. Mayer19 Oct 2023 23:07 UTC
47 points
4 comments1 min readLW link
(youtu.be)

Even­ing Wiki(pe­dia) Workout

mcint19 Oct 2023 21:29 UTC
1 point
1 comment1 min readLW link

New roles on my team: come build Open Phil’s tech­ni­cal AI safety pro­gram with me!

Ajeya Cotra19 Oct 2023 16:47 UTC
83 points
6 comments4 min readLW link

[Question] In­finite tower of meta-probability

fryolysis19 Oct 2023 16:44 UTC
6 points
5 comments3 min readLW link

A NotKillEvery­oneIsm Ar­gu­ment for Ac­cel­er­at­ing Deep Learn­ing Research

Logan Zoellner19 Oct 2023 16:28 UTC
−7 points
6 comments5 min readLW link
(midwitalignment.substack.com)

Knowl­edge Base 5: Busi­ness model

iwis19 Oct 2023 16:06 UTC
−4 points
2 comments1 min readLW link