LessWrong’s (first) album: I Have Been A Good Bing

Apr 1, 2024, 7:33 AM
569 points
181 comments11 min readLW link

Trans­form­ers Rep­re­sent Belief State Geom­e­try in their Resi­d­ual Stream

Adam ShaiApr 16, 2024, 9:16 PM
418 points
100 comments12 min readLW link

Thoughts on seed oil

dynomightApr 20, 2024, 12:29 PM
352 points
129 comments17 min readLW link
(dynomight.net)

[April Fools’ Day] In­tro­duc­ing Open As­teroid Impact

LinchApr 1, 2024, 8:14 AM
336 points
29 commentsLW link
(openasteroidimpact.org)

Ex­press in­ter­est in an “FHI of the West”

habrykaApr 18, 2024, 3:32 AM
268 points
41 comments3 min readLW link

Paul Chris­ti­ano named as US AI Safety In­sti­tute Head of AI Safety

Joel BurgetApr 16, 2024, 4:22 PM
256 points
58 comments1 min readLW link
(www.commerce.gov)

Re­fusal in LLMs is me­di­ated by a sin­gle direction

Apr 27, 2024, 11:13 AM
246 points
95 comments10 min readLW link

Funny Anec­dote of Eliezer From His Sister

Noah BirnbaumApr 22, 2024, 10:05 PM
206 points
6 comments2 min readLW link

[Question] Ex­am­ples of Highly Coun­ter­fac­tual Dis­cov­er­ies?

johnswentworthApr 23, 2024, 10:19 PM
194 points
102 comments1 min readLW link

OMMC An­nounces RIP

Apr 1, 2024, 11:20 PM
189 points
5 comments2 min readLW link

On Not Pul­ling The Lad­der Up Be­hind You

ScrewtapeApr 26, 2024, 9:58 PM
188 points
21 comments9 min readLW link

Why Would Belief-States Have A Frac­tal Struc­ture, And Why Would That Mat­ter For In­ter­pretabil­ity? An Explainer

Apr 18, 2024, 12:27 AM
185 points
21 comments7 min readLW link

FHI (Fu­ture of Hu­man­ity In­sti­tute) has shut down (2005–2024)

gwernApr 17, 2024, 1:54 PM
176 points
22 comments1 min readLW link
(www.futureofhumanityinstitute.org)

Re­con­sider the anti-cav­ity bac­te­ria if you are Asian

Lao MeinApr 15, 2024, 7:02 AM
170 points
43 comments4 min readLW link

Iron­ing Out the Squiggles

Zack_M_DavisApr 29, 2024, 4:13 PM
157 points
36 comments11 min readLW link

Pri­ors and Prejudice

MathiasKBApr 22, 2024, 3:00 PM
151 points
31 comments7 min readLW link

Daniel Den­nett has died (1942-2024)

kaveApr 19, 2024, 4:17 PM
150 points
5 comments1 min readLW link
(dailynous.com)

LLMs for Align­ment Re­search: a safety pri­or­ity?

abramdemskiApr 4, 2024, 8:03 PM
145 points
24 comments11 min readLW link

When is a mind me?

Rob BensingerApr 17, 2024, 5:56 AM
144 points
130 comments15 min readLW link

My ex­pe­rience us­ing fi­nan­cial com­mit­ments to over­come akrasia

William HowardApr 15, 2024, 10:57 PM
137 points
33 comments18 min readLW link

Sim­ple probes can catch sleeper agents

Apr 23, 2024, 9:10 PM
133 points
21 comments1 min readLW link
(www.anthropic.com)

A Dozen Ways to Get More Dakka

DavidmanheimApr 8, 2024, 4:45 AM
132 points
11 comments3 min readLW link

RTFB: On the New Pro­posed CAIP AI Bill

ZviApr 10, 2024, 6:30 PM
119 points
14 comments34 min readLW link
(thezvi.wordpress.com)

Discrim­i­nat­ing Be­hav­iorally Iden­ti­cal Clas­sifiers: a model prob­lem for ap­ply­ing in­ter­pretabil­ity to scal­able oversight

Sam MarksApr 18, 2024, 4:17 PM
113 points
10 comments12 min readLW link

A Selec­tion of Ran­domly Selected SAE Features

Apr 1, 2024, 9:09 AM
109 points
2 comments4 min readLW link

[Question] What con­vinc­ing warn­ing shot could help pre­vent ex­tinc­tion from AI?

Apr 13, 2024, 6:09 PM
106 points
22 comments2 min readLW link

The first fu­ture and the best future

KatjaGraceApr 25, 2024, 6:40 AM
106 points
12 comments1 min readLW link
(worldspiritsockpuppet.com)

Carl Sa­gan, nuk­ing the moon, and not nuk­ing the moon

eukaryoteApr 13, 2024, 4:08 AM
104 points
8 comments6 min readLW link
(eukaryotewritesblog.com)

Spar­sify: A mechanis­tic in­ter­pretabil­ity re­search agenda

Lee SharkeyApr 3, 2024, 12:34 PM
96 points
23 comments22 min readLW link

MIRI’s April 2024 Newsletter

HarlanApr 12, 2024, 11:38 PM
95 points
0 comments3 min readLW link
(intelligence.org)

Towards Mul­ti­modal In­ter­pretabil­ity: Learn­ing Sparse In­ter­pretable Fea­tures in Vi­sion Transformers

hugofryApr 29, 2024, 8:57 PM
93 points
8 comments11 min readLW link

Par­tial value takeover with­out world takeover

KatjaGraceApr 5, 2024, 6:20 AM
89 points
23 comments3 min readLW link
(worldspiritsockpuppet.com)

Re­ject­ing Television

Declan MolonyApr 23, 2024, 4:59 AM
89 points
10 comments6 min readLW link

Con­structabil­ity: Plainly-coded AGIs may be fea­si­ble in the near future

Apr 27, 2024, 4:04 PM
85 points
13 comments13 min readLW link

Es­say com­pe­ti­tion on the Au­toma­tion of Wis­dom and Philos­o­phy — $25k in prizes

Apr 16, 2024, 10:10 AM
82 points
12 comments8 min readLW link
(blog.aiimpacts.org)

[Full Post] Progress Up­date #1 from the GDM Mech In­terp Team

Apr 19, 2024, 7:06 PM
79 points
10 comments8 min readLW link

A cou­ple pro­duc­tivity tips for overthinkers

Steven ByrnesApr 20, 2024, 4:05 PM
78 points
13 comments4 min readLW link

Best in Class Life Improvement

sapphireApr 4, 2024, 1:51 AM
78 points
20 comments1 min readLW link

Creat­ing un­re­stricted AI Agents with Com­mand R+

Simon LermenApr 16, 2024, 2:52 PM
77 points
13 comments5 min readLW link

Co­her­ence of Caches and Agents

johnswentworthApr 1, 2024, 11:04 PM
77 points
9 comments11 min readLW link

Mid-con­di­tional love

KatjaGraceApr 17, 2024, 4:00 AM
76 points
21 comments2 min readLW link
(worldspiritsockpuppet.com)

AISC9 has ended and there will be an AISC10

Linda LinseforsApr 29, 2024, 10:53 AM
75 points
4 comments2 min readLW link

A Gen­tle In­tro­duc­tion to Risk Frame­works Beyond Forecasting

pendingsurvivalApr 11, 2024, 6:03 PM
73 points
10 comments27 min readLW link

An­nounc­ing Suffer­ing For Good

Garrett BakerApr 1, 2024, 5:08 PM
73 points
5 comments1 min readLW link

A gen­tle in­tro­duc­tion to mechanis­tic anomaly detection

Erik JennerApr 3, 2024, 11:06 PM
73 points
2 comments11 min readLW link

[Sum­mary] Progress Up­date #1 from the GDM Mech In­terp Team

Apr 19, 2024, 7:06 PM
72 points
0 comments3 min readLW link

Prompts for Big-Pic­ture Planning

RaemonApr 13, 2024, 3:04 AM
72 points
1 comment3 min readLW link

LW Front­page Ex­per­i­ments! (aka “Take the wheel, Shog­goth!”)

Apr 23, 2024, 3:58 AM
71 points
27 comments5 min readLW link

Gen­er­al­ized Stat Mech: The Boltz­mann Approach

Apr 12, 2024, 5:47 PM
71 points
7 comments20 min readLW link

How We Pic­ture Bayesian Agents

Apr 8, 2024, 6:12 PM
70 points
14 comments7 min readLW link