RSS

Agents as Webs of Beliefs

Richard_Ngo27 Jun 2026 21:45 UTC
32 points
1 comment9 min readLW link
(www.mindthefuture.info)

The Pri­soner’s Dilemma (Fic­tion)

testingthewaters27 Jun 2026 20:51 UTC
1 point
0 comments9 min readLW link

Neu­ralese is Ac­tu­ally Prob­a­bly Good for Alignment

DaemonicSigil27 Jun 2026 19:40 UTC
14 points
7 comments6 min readLW link

Should we just quit?

Jacob Abraham27 Jun 2026 13:44 UTC
1 point
0 comments3 min readLW link

Flip­ping the eval on its head

Quinn27 Jun 2026 13:34 UTC
12 points
0 comments3 min readLW link

De­ploy­ment Aware­ness Mat­ters More Than Eval­u­a­tion Awareness

26 Jun 2026 22:54 UTC
35 points
3 comments7 min readLW link
(limits-of-evaluation.org)

Just a Wrap­per? How Much Do Scaf­folds Mat­ter?

26 Jun 2026 22:21 UTC
9 points
0 comments11 min readLW link
(mitfuturetech.substack.com)

What did “schem­ing” and “mech in­terp” mean pre-2023?

Cleo Nardo26 Jun 2026 22:09 UTC
69 points
1 comment2 min readLW link

Why are ad­ver­saries as­sumed to be in­ca­pable of re­spond­ing to AI risk?

KatjaGrace26 Jun 2026 21:51 UTC
91 points
7 comments1 min readLW link
(worldspiritsockpuppet.substack.com)

Screen­casts could be scal­able data + evals for sin­gle-user em­u­la­tion (Guardian An­gels)

sophia_xu26 Jun 2026 21:12 UTC
4 points
0 comments1 min readLW link

Ran­dom­ness Test­ing with Bayesian Stats

DaemonicSigil26 Jun 2026 20:28 UTC
8 points
0 comments5 min readLW link

Should we com­bine pro­to­cols for AI Con­trol Re­search?

26 Jun 2026 19:02 UTC
9 points
0 comments9 min readLW link

Not mak­ing a strong ar­gu­ment is a relief

Kaj_Sotala26 Jun 2026 18:41 UTC
43 points
5 comments10 min readLW link
(kajsotala.substack.com)

The Case for Model Forensics

26 Jun 2026 15:09 UTC
44 points
0 comments10 min readLW link

A full body MRI earns you a year of smoking

kqr26 Jun 2026 14:31 UTC
11 points
7 comments3 min readLW link
(entropicthoughts.com)

Don’t ig­nore the car crashes, and re­mem­ber your fresh­man CS

jcksanderson26 Jun 2026 7:06 UTC
36 points
0 comments2 min readLW link
(jcksanderson.com)

LLMenard

Ilyass Mofaddel26 Jun 2026 4:26 UTC
1 point
0 comments5 min readLW link

Re­search note on negated re­ward hacking

ChristopherT26 Jun 2026 2:22 UTC
8 points
0 comments6 min readLW link

Sur­pris­ing facts about the slave trade

Joseph Miller26 Jun 2026 1:34 UTC
255 points
12 comments8 min readLW link

In­tel­li­gence (Ar­tifi­cial) in a Fallen World

Joseph Babbo25 Jun 2026 23:58 UTC
3 points
0 comments11 min readLW link