RSS

Claude’s mal­i­cious com­pli­ance and nor­mal­iza­tion of deviance

Steff5 Jul 2026 17:18 UTC
17 points
0 comments8 min readLW link

Prob­ing the loss-band spar­sity as­sump­tion in Scien­tist AI

Alejandro Tlaie5 Jul 2026 16:25 UTC
8 points
0 comments7 min readLW link

Book Re­view: The God Test

PeterMcCluskey5 Jul 2026 16:11 UTC
14 points
0 comments4 min readLW link

We need 3rd party Train­ing-Run Assessments

Alex Meinke5 Jul 2026 15:55 UTC
26 points
0 comments10 min readLW link

Harry Pot­ter and the Rules of Quidditch

Tomás B.5 Jul 2026 14:32 UTC
57 points
3 comments3 min readLW link

A Nor­mal Ar­gu­ment for AI Risk

Silent Swift5 Jul 2026 9:32 UTC
15 points
0 comments8 min readLW link
(silentswift.substack.com)

The case for the fleshman

dr_s5 Jul 2026 9:28 UTC
14 points
11 comments4 min readLW link

Ree­val­u­at­ing AI-2027: timelines, take­off, al­ign­ment and China

StanislavKrym5 Jul 2026 4:00 UTC
14 points
2 comments5 min readLW link

Suc­cess Per Tokens

michaelwaves5 Jul 2026 2:25 UTC
8 points
0 comments3 min readLW link

A case for LLMs as Self-predictors

Ashe Vazquez Nuñez5 Jul 2026 0:25 UTC
30 points
0 comments10 min readLW link

Defin­ing in­ter­pre­ta­tion, and es­tab­lish­ing a frame­work for it

Yaroven4 Jul 2026 16:31 UTC
6 points
0 comments5 min readLW link

The Lace (short story)

Michael Soareverix4 Jul 2026 4:43 UTC
22 points
0 comments4 min readLW link

Ap­prox­i­mate Nat­u­ral La­tents Have Ex­act Prices

Haru4 Jul 2026 1:57 UTC
18 points
0 comments6 min readLW link

I think al­ign­ment work is more promis­ing than con­trol work

Alec Harris3 Jul 2026 23:40 UTC
85 points
9 comments8 min readLW link

On “gen­dertropes” in dath ilan

Eliezer Yudkowsky3 Jul 2026 22:20 UTC
59 points
1 comment3 min readLW link

Amer­i­can AI if the boom is a bub­ble: the Karp-Zitron scenario

Mitchell_Porter3 Jul 2026 21:46 UTC
6 points
1 comment2 min readLW link

(Don’t fear) the strangelet

djbinder3 Jul 2026 17:39 UTC
101 points
2 comments22 min readLW link
(defensesindepth.bio)

The Re­v­erse AI Box

James_Miller3 Jul 2026 16:08 UTC
9 points
1 comment6 min readLW link

Prag­matic FDT, and pre­dic­tors as game theory

Stuart_Armstrong3 Jul 2026 13:22 UTC
32 points
10 comments11 min readLW link

Schem­ing Evals Mislead in Both Directions

3 Jul 2026 11:49 UTC
21 points
0 comments10 min readLW link