RSS

Find­ing X-Risks and S-Risks by Gra­di­ent Descent

dspeyer25 Mar 2026 17:58 UTC
7 points
0 comments2 min readLW link

The Scary Bridge

moridinamael25 Mar 2026 17:11 UTC
9 points
0 comments2 min readLW link

How to do illu­sion­ist meditation

jackmastermind25 Mar 2026 17:10 UTC
4 points
0 comments11 min readLW link

Can Agents Fool Each Other? Find­ings from the AI Village

Shoshannah Tekofsky25 Mar 2026 17:05 UTC
12 points
1 comment3 min readLW link
(theaidigest.org)

I lost my faith in in­tro­spec­tion—and you can too!

jackmastermind25 Mar 2026 17:00 UTC
4 points
0 comments4 min readLW link

Don’t Write Off Hu­man La­bor, Yet

burnssa25 Mar 2026 16:37 UTC
−2 points
0 comments8 min readLW link

Galaxy-brained model-chat: ASI con­sti­tu­tions & the cos­mic host

ukc1001425 Mar 2026 16:05 UTC
11 points
0 comments16 min readLW link

How to do cost-effec­tive­ness anal­y­sis for elections

Zach Stein-Perlman25 Mar 2026 15:00 UTC
5 points
1 comment3 min readLW link

My Cog­ni­tive Ar­chi­tec­ture: A Self-Ob­ser­va­tional Map

Naj Ami-Nave25 Mar 2026 2:23 UTC
2 points
0 comments8 min readLW link

Is Gem­ini 3 Schem­ing in the Wild?

25 Mar 2026 1:12 UTC
57 points
2 comments17 min readLW link

Book Re­view: Open Socrates (Part 1)

Zvi24 Mar 2026 22:21 UTC
22 points
3 comments116 min readLW link
(thezvi.wordpress.com)

Book Re­view: Open Socrates (Part 2)

Zvi24 Mar 2026 22:20 UTC
13 points
1 comment81 min readLW link
(thezvi.wordpress.com)

Agents Can Get Stuck in Self-dis­trust­ing Equilibria

Ashe Vazquez Nuñez24 Mar 2026 22:05 UTC
29 points
1 comment12 min readLW link

La­tent In­tro­spec­tion (and other open-source in­tro­spec­tion pa­pers)

vgel24 Mar 2026 21:23 UTC
81 points
1 comment9 min readLW link
(arxiv.org)

An In­for­mal Defi­ni­tion of Goals for Embed­ded Agents

Ashe Vazquez Nuñez24 Mar 2026 18:36 UTC
10 points
0 comments1 min readLW link

My cost-effec­tive­ness unit

Zach Stein-Perlman24 Mar 2026 15:30 UTC
42 points
4 comments4 min readLW link

The Fourth World

Linch24 Mar 2026 13:43 UTC
19 points
14 comments6 min readLW link

Safe Re­cur­sive Self-Im­prove­ment with Ver­ified Compilers

Adam Chlipala24 Mar 2026 13:35 UTC
10 points
0 comments11 min readLW link

Com­par­ing Across Pos­si­ble Worlds

unruly abstractions24 Mar 2026 10:09 UTC
7 points
3 comments5 min readLW link

The AIXI per­spec­tive on AI Safety

Cole Wyeth24 Mar 2026 3:24 UTC
76 points
0 comments6 min readLW link