There is way too much serendipity

MalmesburyJan 19, 2024, 7:37 PM
376 points
56 comments7 min readLW link

Gentle­ness and the ar­tifi­cial Other

Joe CarlsmithJan 2, 2024, 6:21 PM
313 points
33 comments11 min readLW link

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

Jan 12, 2024, 7:51 PM
305 points
95 comments3 min readLW link
(arxiv.org)

The case for en­sur­ing that pow­er­ful AIs are controlled

Jan 24, 2024, 4:11 PM
275 points
73 comments28 min readLW link

MIRI 2024 Mis­sion and Strat­egy Update

MaloJan 5, 2024, 12:20 AM
223 points
44 comments8 min readLW link

Toward A Math­e­mat­i­cal Frame­work for Com­pu­ta­tion in Superposition

Jan 18, 2024, 9:06 PM
204 points
18 comments63 min readLW link

The im­pos­si­ble prob­lem of due process

mingyuanJan 16, 2024, 5:18 AM
197 points
64 comments14 min readLW link

This might be the last AI Safety Camp

Jan 24, 2024, 9:33 AM
196 points
34 comments1 min readLW link

In­tro­duc­ing Align­ment Stress-Test­ing at Anthropic

evhubJan 12, 2024, 11:51 PM
182 points
23 comments2 min readLW link

Without fun­da­men­tal ad­vances, mis­al­ign­ment and catas­tro­phe are the de­fault out­comes of train­ing pow­er­ful AI

Jan 26, 2024, 7:22 AM
161 points
60 comments57 min readLW link

Mak­ing ev­ery re­searcher seek grants is a bro­ken model

jasoncrawfordJan 26, 2024, 4:06 PM
159 points
41 comments4 min readLW link
(rootsofprogress.org)

What’s up with LLMs rep­re­sent­ing XORs of ar­bi­trary fea­tures?

Sam MarksJan 3, 2024, 7:44 PM
158 points
63 comments16 min readLW link

Apol­o­giz­ing is a Core Ra­tion­al­ist Skill

johnswentworthJan 2, 2024, 5:47 PM
156 points
42 comments5 min readLW link

Deep athe­ism and AI risk

Joe CarlsmithJan 4, 2024, 6:58 PM
153 points
22 comments27 min readLW link

What good is G-fac­tor if you’re dumped in the woods? A field re­port from a camp coun­selor.

HastingsJan 12, 2024, 1:17 PM
145 points
22 comments1 min readLW link

No­tice When Peo­ple Are Direc­tion­ally Correct

Chris_LeongJan 14, 2024, 2:12 PM
136 points
8 comments2 min readLW link

Pro­ces­sor clock speeds are not how fast AIs think

Ege ErdilJan 29, 2024, 2:39 PM
135 points
55 comments2 min readLW link

The case for train­ing fron­tier AIs on Sume­rian-only corpus

Jan 15, 2024, 4:40 PM
130 points
16 comments3 min readLW link

Steer­ing Llama-2 with con­trastive ac­ti­va­tion additions

Jan 2, 2024, 12:47 AM
125 points
29 comments8 min readLW link
(arxiv.org)

An even deeper atheism

Joe CarlsmithJan 11, 2024, 5:28 PM
125 points
47 comments15 min readLW link

A Shut­down Prob­lem Proposal

Jan 21, 2024, 6:12 PM
125 points
61 comments6 min readLW link

Why I take short timelines seriously

NicholasKeesJan 28, 2024, 10:27 PM
122 points
29 comments4 min readLW link

The case for more am­bi­tious lan­guage model evals

JozdienJan 30, 2024, 12:01 AM
117 points
30 comments5 min readLW link

Gen­der Exploration

sapphireJan 14, 2024, 6:57 PM
117 points
26 comments5 min readLW link
(open.substack.com)

Four vi­sions of Trans­for­ma­tive AI success

Steven ByrnesJan 17, 2024, 8:45 PM
112 points
22 comments15 min readLW link

Prac­ti­cally A Book Re­view: Ap­pendix to “Non­lin­ear’s Ev­i­dence: De­bunk­ing False and Mislead­ing Claims” (ThingOfThings)

tailcalledJan 3, 2024, 5:07 PM
111 points
25 comments2 min readLW link
(thingofthings.substack.com)

Catch­ing AIs red-handed

Jan 5, 2024, 5:43 PM
110 points
27 comments17 min readLW link

′ pe­ter­todd’’s last stand: The fi­nal days of open GPT-3 research

mwatkinsJan 22, 2024, 6:47 PM
109 points
16 comments45 min readLW link

Be­ing nicer than Clippy

Joe CarlsmithJan 16, 2024, 7:44 PM
109 points
32 comments27 min readLW link

2023 in AI predictions

jessicataJan 1, 2024, 5:23 AM
107 points
35 comments5 min readLW link

De­cep­tive AI ≠ De­cep­tively-al­igned AI

Steven ByrnesJan 7, 2024, 4:55 PM
96 points
19 comments6 min readLW link

Al­most ev­ery­one I’ve met would be well-served think­ing more about what to fo­cus on

Henrik KarlssonJan 5, 2024, 9:01 PM
96 points
8 comments11 min readLW link
(www.henrikkarlsson.xyz)

RAND re­port finds no effect of cur­rent LLMs on vi­a­bil­ity of bioter­ror­ism attacks

StellaAthenaJan 25, 2024, 7:17 PM
94 points
14 comments1 min readLW link
(www.rand.org)

On the abo­li­tion of man

Joe CarlsmithJan 18, 2024, 6:17 PM
90 points
18 comments41 min readLW link

The Aspiring Ra­tion­al­ist Congregation

maiaJan 10, 2024, 10:52 PM
86 points
23 comments10 min readLW link

Sparse Au­toen­coders Work on At­ten­tion Layer Outputs

Jan 16, 2024, 12:26 AM
83 points
9 comments18 min readLW link

Some Va­ca­tion Photos

johnswentworthJan 4, 2024, 5:15 PM
83 points
0 comments1 min readLW link

An In­tro­duc­tion To The Man­delbrot Set That Doesn’t Men­tion Com­plex Numbers

YitzJan 17, 2024, 9:48 AM
82 points
11 comments9 min readLW link

Pal­world de­vel­op­ment blog post

bhauthJan 28, 2024, 5:56 AM
82 points
12 comments1 min readLW link
(note.com)

Sur­vey of 2,778 AI au­thors: six parts in pictures

KatjaGraceJan 6, 2024, 4:43 AM
80 points
1 comment2 min readLW link

[Re­post] The Copen­hagen In­ter­pre­ta­tion of Ethics

mesaoptimizerJan 25, 2024, 3:20 PM
77 points
4 comments5 min readLW link
(web.archive.org)

Univer­sal Love In­te­gra­tion Test: Hitler

RaemonJan 10, 2024, 11:55 PM
76 points
65 comments9 min readLW link

When “yang” goes wrong

Joe CarlsmithJan 8, 2024, 4:35 PM
73 points
6 comments13 min readLW link

Epistemic Hell

rogersbaconJan 27, 2024, 5:13 PM
71 points
20 comments14 min readLW link

We need a Science of Evals

Jan 22, 2024, 8:30 PM
71 points
13 comments9 min readLW link

The True Story of How GPT-2 Be­came Max­i­mally Lewd

Jan 18, 2024, 9:03 PM
70 points
7 comments6 min readLW link
(youtu.be)

In­terLab – a toolkit for ex­per­i­ments with multi-agent interactions

Jan 22, 2024, 6:23 PM
69 points
0 comments8 min readLW link
(acsresearch.org)

[Question] Will quan­tum ran­dom­ness af­fect the 2028 elec­tion?

Jan 24, 2024, 10:54 PM
66 points
52 comments1 min readLW link

OpenAI’s Pre­pared­ness Frame­work: Praise & Recommendations

Orpheus16Jan 2, 2024, 4:20 PM
66 points
1 comment7 min readLW link

The Per­cep­tron Controversy

Yuxi_LiuJan 10, 2024, 11:07 PM
65 points
18 comments1 min readLW link
(yuxi-liu-wired.github.io)