RSS

My Cog­ni­tive Ar­chi­tec­ture: A Self-Ob­ser­va­tional Map

Naj Ami-Nave25 Mar 2026 2:23 UTC
1 point
0 comments8 min readLW link

Is Gem­ini 3 Schem­ing in the Wild?

25 Mar 2026 1:12 UTC
41 points
1 comment17 min readLW link

Book Re­view: Open Socrates (Part 1)

Zvi24 Mar 2026 22:21 UTC
22 points
2 comments116 min readLW link
(thezvi.wordpress.com)

Book Re­view: Open Socrates (Part 2)

Zvi24 Mar 2026 22:20 UTC
13 points
1 comment81 min readLW link
(thezvi.wordpress.com)

Agents Can Get Stuck in Self-dis­trust­ing Equilibria

Ashe Vazquez Nuñez24 Mar 2026 22:05 UTC
26 points
1 comment12 min readLW link

La­tent In­tro­spec­tion (and other open-source in­tro­spec­tion pa­pers)

vgel24 Mar 2026 21:23 UTC
79 points
1 comment9 min readLW link
(arxiv.org)

An In­for­mal Defi­ni­tion of Goals for Embed­ded Agents

Ashe Vazquez Nuñez24 Mar 2026 18:36 UTC
10 points
0 comments1 min readLW link

My cost-effec­tive­ness unit

Zach Stein-Perlman24 Mar 2026 15:30 UTC
42 points
3 comments4 min readLW link

The Fourth World

Linch24 Mar 2026 13:43 UTC
17 points
9 comments6 min readLW link

Safe Re­cur­sive Self-Im­prove­ment with Ver­ified Compilers

Adam Chlipala24 Mar 2026 13:35 UTC
9 points
0 comments11 min readLW link

Com­par­ing Across Pos­si­ble Worlds

unruly abstractions24 Mar 2026 10:09 UTC
7 points
3 comments5 min readLW link

The AIXI per­spec­tive on AI Safety

Cole Wyeth24 Mar 2026 3:24 UTC
69 points
0 comments6 min readLW link

In­for­ma­tion Overdose

0xmadlad24 Mar 2026 2:26 UTC
−2 points
0 comments3 min readLW link

We can­not safely au­to­mate value al­ign­ment eval­u­a­tion and re­search with­out think­ing about del­e­ga­tion and discretion

Maria Federica Martino Lena 24 Mar 2026 2:25 UTC
2 points
0 comments10 min readLW link

Every Ma­jor LLM is a 1-Box Smok­ing Thirder

Olivia Scharfman24 Mar 2026 2:18 UTC
10 points
4 comments10 min readLW link

A ToM-In­spired Agenda for AI Safety Research

Andrés Cotton24 Mar 2026 2:13 UTC
7 points
0 comments5 min readLW link

Pok­ing and Edit­ing the Circuits

unruly abstractions24 Mar 2026 1:15 UTC
2 points
0 comments8 min readLW link

Adopt a de­bug­ger’s mind­set to solve your re­cur­ring life problems

Declan Molony24 Mar 2026 0:22 UTC
21 points
3 comments8 min readLW link

Ex­per­i­men­tal Ev­i­dence for Si­mu­la­tor The­ory— Part 2: The Scalers Strike Back

RogerDearnaley23 Mar 2026 22:37 UTC
13 points
0 comments34 min readLW link

Ex­per­i­men­tal Ev­i­dence for Si­mu­la­tor The­ory— Part 1: Emer­gent Misal­ign­ment and Weird Generalizations

RogerDearnaley23 Mar 2026 22:37 UTC
19 points
0 comments53 min readLW link