RSS

AIs will greatly change en­g­ineer­ing in AI com­pa­nies well be­fore AGI

ryan_greenblatt9 Sep 2025 16:58 UTC
38 points
3 comments11 min readLW link

Large Lan­guage Models and the Crit­i­cal Brain Hypothesis

David Africa9 Sep 2025 15:45 UTC
28 points
0 comments6 min readLW link

De­ci­sion The­ory Guard­ing is Suffi­cient for Scheming

james.lucassen9 Sep 2025 14:49 UTC
31 points
3 comments2 min readLW link

Safety cases for Pessimism

michaelcohen8 Sep 2025 13:26 UTC
16 points
1 comment4 min readLW link

How Can You Tell if You’ve In­stil­led a False Belief in Your LLM?

james.lucassen6 Sep 2025 16:45 UTC
14 points
1 comment10 min readLW link
(jlucassen.com)

Nar­row Fine­tun­ing Leaves Clearly Read­able Traces in Ac­ti­va­tion Differences

5 Sep 2025 12:11 UTC
28 points
1 comment7 min readLW link

Nat­u­ral La­tents: La­tent Vari­ables Stable Across Ontologies

4 Sep 2025 0:33 UTC
110 points
13 comments20 min readLW link

Trust me bro, just one more RL scale up, this one will be the real scale up with the good en­vi­ron­ments, the ac­tu­ally le­git one, trust me bro

ryan_greenblatt3 Sep 2025 13:21 UTC
150 points
25 comments8 min readLW link

How To Be­come A Mechanis­tic In­ter­pretabil­ity Researcher

Neel Nanda2 Sep 2025 23:38 UTC
99 points
12 comments55 min readLW link

Sleep­ing Ex­perts in the (re­flec­tive) Solomonoff Prior

31 Aug 2025 4:55 UTC
16 points
0 comments3 min readLW link

At­tach­ing re­quire­ments to model re­leases has se­ri­ous down­sides (rel­a­tive to a differ­ent dead­line for these re­quire­ments)

ryan_greenblatt27 Aug 2025 17:04 UTC
98 points
2 comments3 min readLW link

AI com­pa­nies have started say­ing safe­guards are load-bearing

Zach Stein-Perlman27 Aug 2025 13:00 UTC
51 points
2 comments5 min readLW link

AI In­duced Psy­chosis: A shal­low investigation

Tim Hua26 Aug 2025 20:03 UTC
320 points
38 comments26 min readLW link

Harm­less re­ward hacks can gen­er­al­ize to mis­al­ign­ment in LLMs

26 Aug 2025 17:32 UTC
46 points
6 comments7 min readLW link

Do-Diver­gence: A Bound for Maxwell’s Demon

26 Aug 2025 17:07 UTC
66 points
4 comments3 min readLW link

New Paper on Reflec­tive Or­a­cles & Grain of Truth Problem

Cole Wyeth26 Aug 2025 0:18 UTC
53 points
0 comments1 min readLW link

Hid­den Rea­son­ing in LLMs: A Taxonomy

25 Aug 2025 22:43 UTC
60 points
8 comments12 min readLW link

Notes on co­op­er­at­ing with un­al­igned AIs

Lukas Finnveden24 Aug 2025 4:19 UTC
45 points
8 comments21 min readLW link
(blog.redwoodresearch.org)

(∃ Stochas­tic Nat­u­ral La­tent) Im­plies (∃ Deter­minis­tic Nat­u­ral La­tent)

22 Aug 2025 21:46 UTC
120 points
8 comments9 min readLW link

One more rea­son for AI ca­pa­ble of in­de­pen­dent moral rea­son­ing: al­ign­ment it­self and cause prioritisation

Michele Campolo22 Aug 2025 15:53 UTC
−3 points
0 comments3 min readLW link