RSS

Patholog­i­cal Nar­cis­sism: The Pen­du­lum Swing be­tween Echo­ism and Sovereignism

Dawn Drescher22 Jun 2026 21:56 UTC
5 points
0 comments8 min readLW link
(impartial-priorities.org)

Speedup from AI Ghostwriting

Dawn Drescher22 Jun 2026 21:47 UTC
10 points
0 comments5 min readLW link
(impartial-priorities.org)

Func­tional Emo­tions and The Pope’s En­cycli­cal on AI — Digi­tal Minds Newslet­ter #3

22 Jun 2026 19:44 UTC
14 points
0 comments39 min readLW link
(www.digitalminds.news)

Plan­ning for Preser­va­tion in the Age of AI

Raelifin22 Jun 2026 17:51 UTC
19 points
0 comments13 min readLW link

Ad­vo­cates Can In­fluence LLM Values By Edit­ing Wikipedia

22 Jun 2026 15:47 UTC
5 points
4 comments8 min readLW link

Learn­ing to Un­der­stand Evil

Yulia22 Jun 2026 14:43 UTC
5 points
0 comments7 min readLW link

Defeatism as Disempowerment

Ramya22 Jun 2026 14:24 UTC
4 points
0 comments3 min readLW link

A Mechanis­tic Ex­pla­na­tion of Prompt In­jec­tion (and why you should study roles)

22 Jun 2026 14:09 UTC
57 points
6 comments16 min readLW link

The AI In­dus­trial Ex­plo­sion — Part 4: Cheap power

djbinder22 Jun 2026 13:30 UTC
32 points
0 comments16 min readLW link
(defensesindepth.bio)

Not all fea­tures are cre­ated equal

enricobottazzi22 Jun 2026 13:20 UTC
17 points
0 comments9 min readLW link

Brit­tle model or­ganisms ob­structs de­cep­tion elic­i­ta­tion work

22 Jun 2026 10:48 UTC
19 points
3 comments7 min readLW link

We made a map of the doom de­bate. Here’s how the break­down works.

22 Jun 2026 10:48 UTC
13 points
0 comments7 min readLW link

Nat­u­rally learned be­hav­iors in deep MLPs re­sist de­tec­tion by both hu­man and learned algorithms

emanuelr22 Jun 2026 9:10 UTC
10 points
0 comments12 min readLW link

A US-China in­ter­ven­tion be­cause we can’t ex­pect an­other Arkhipov/​Petrov

Troy Tian22 Jun 2026 7:31 UTC
10 points
0 comments1 min readLW link

On rev­olu­tion­ary love in AI safety

Troy Tian22 Jun 2026 3:48 UTC
8 points
0 comments4 min readLW link

Do AI Biorisk Thresh­olds Need In­ter­me­di­ate Warn­ing Levels?

Lukas Frei22 Jun 2026 1:09 UTC
9 points
0 comments3 min readLW link

NLA ex­pla­na­tions can be short­ened with­out harm­ing reconstruction

loops22 Jun 2026 0:57 UTC
48 points
4 comments3 min readLW link

In­tro­duc­ing MonitoringBench

monika_j21 Jun 2026 18:43 UTC
44 points
0 comments6 min readLW link

How per­sona train­ing could fail

Simon Lermen21 Jun 2026 16:38 UTC
13 points
0 comments4 min readLW link

A high-level model of AI bargaining

Anthony DiGiovanni21 Jun 2026 15:37 UTC
18 points
1 comment5 min readLW link