RSS

AI com­pa­nies should pub­lish se­cu­rity assessments

ryan_greenblatt27 Apr 2026 14:39 UTC
41 points
1 comment3 min readLW link

In defense of parents

Yair Halberstadt27 Apr 2026 14:18 UTC
34 points
1 comment6 min readLW link

Cu­ri­ous cases of fi­nan­cial en­g­ineer­ing in biotech

Abhishaike Mahajan27 Apr 2026 14:09 UTC
18 points
0 comments22 min readLW link
(www.owlposting.com)

Black­mail at 8 Billion Pa­ram­e­ters: Agen­tic Misal­ign­ment in Sub-Fron­tier Models

Chijioke Ugwuanyi27 Apr 2026 8:59 UTC
12 points
0 comments7 min readLW link

The other pa­per that kil­led deep learn­ing theory

LawrenceC27 Apr 2026 6:57 UTC
34 points
1 comment8 min readLW link

AI might sur­prise it­self by go­ing rogue

David Scott Krueger27 Apr 2026 6:30 UTC
6 points
0 comments2 min readLW link
(therealartificialintelligence.substack.com)

How does Re­in­force­ment Learn­ing Affect Models

humanityfirst27 Apr 2026 5:22 UTC
2 points
1 comment2 min readLW link

Ret­ro­spec­tive on my un­su­per­vised elic­i­ta­tion challenge

DanielFilan27 Apr 2026 0:30 UTC
52 points
0 comments8 min readLW link
(danielfilan.com)

Align­ment Fak­ing Repli­ca­tion and Chain-of-Thought Mon­i­tor­ing Extensions

Angela Tang26 Apr 2026 23:55 UTC
7 points
0 comments8 min readLW link

Train­ing a Trans­former to Com­pose One Step Per Layer (and Prov­ing It)

Brendan Long26 Apr 2026 23:45 UTC
16 points
0 comments7 min readLW link

AI for life strat­egy ad­vice: a per­sonal experiment

Jonah Wilberg26 Apr 2026 22:18 UTC
10 points
2 comments6 min readLW link

Spon­ta­neous in­tro­spec­tion in out­put tampering

Ziqian Zhong26 Apr 2026 20:05 UTC
12 points
0 comments12 min readLW link

How do in­ten­tional se­cret loy­alties differ from other schemer mo­ti­va­tions?

Cleo Nardo26 Apr 2026 20:03 UTC
24 points
1 comment12 min readLW link

Con­trol pro­to­cols don’t always need to know which mod­els are scheming

Fabien Roger26 Apr 2026 19:16 UTC
38 points
1 comment6 min readLW link

“Bad faith” means in­ten­tion­ally mis­rep­re­sent­ing your beliefs

TFD26 Apr 2026 19:07 UTC
35 points
15 comments6 min readLW link

Me, decay

Dentosal26 Apr 2026 17:14 UTC
8 points
1 comment3 min readLW link

Uni­verses can spe­cial­ize: Each uni­verse should pro­duce the goods it’s most com­par­a­tively ad­van­taged at, rel­a­tive to the mul­ti­ver­sal market

Zach Stein-Perlman26 Apr 2026 16:30 UTC
12 points
12 comments2 min readLW link

Why did peo­ple miss the point on Mythos?

draganover26 Apr 2026 12:15 UTC
44 points
13 comments5 min readLW link

Roko’s Basilisk may work on humans

Horosphere26 Apr 2026 9:40 UTC
1 point
4 comments18 min readLW link

Sub­strate: Formalism

26 Apr 2026 8:06 UTC
2 points
0 comments10 min readLW link