RSS

Clar­ify­ing the role of the be­hav­ioral se­lec­tion model

Alex Mallen10 May 2026 19:41 UTC
13 points
0 comments4 min readLW link

Align­ment as Equil­ibrium Design

Elad Hazan10 May 2026 18:56 UTC
3 points
0 comments5 min readLW link

Claude Does Not Ac­tu­ally Taste Bananas: Po­tas­sium-Based Syn­thetic Phenomenol­ogy In Lan­guage Models

Noah Weinberger10 May 2026 17:13 UTC
4 points
0 comments10 min readLW link
(huggingface.co)

The Dar­wi­nian Honey­moon—Why I am not as im­pressed by hu­man progress as I used to be

Elias Schmied10 May 2026 15:55 UTC
48 points
5 comments4 min readLW link

Re­in­force­ment learn­ing scal­ing might in­cen­tivise hid­den rea­son­ing ar­chi­tec­tures for AI

Oliver Sourbut10 May 2026 15:30 UTC
18 points
0 comments6 min readLW link
(www.oliversourbut.net)

Asym­me­try Between Defen­sive and Ac­quisi­tive In­stru­men­tal Deception

keith_wynroe10 May 2026 12:33 UTC
13 points
1 comment5 min readLW link

Con­text Mod­ifi­ca­tion as a Nega­tive Align­ment Tax

Florian_Dietz10 May 2026 11:32 UTC
7 points
0 comments4 min readLW link

The AI In­dus­trial Ex­plo­sion — Part 2: Tran­si­tion Dynamics

djbinder10 May 2026 1:02 UTC
22 points
0 comments12 min readLW link
(defensesindepth.bio)

In­ter­na­tional Law Can­not Prevent Ex­tinc­tion Either

Sausage Vector Machine9 May 2026 22:34 UTC
62 points
8 comments5 min readLW link

Do ca­pa­bil­ities gen­er­al­ize across propen­si­ties?

Emil Ryd9 May 2026 21:39 UTC
12 points
0 comments8 min readLW link

Neu­ral Net­works learn Bloom Filters

Alex Gibson9 May 2026 20:32 UTC
50 points
1 comment12 min readLW link

Ex­plain­ing Vo­li­tion Without Re­sort­ing to Free Will

joseph_c9 May 2026 18:57 UTC
12 points
10 comments1 min readLW link

Se­cond or­der thoughts on cur­rent AI agents

Michael Flood9 May 2026 18:40 UTC
12 points
0 comments2 min readLW link

If digi­tal com­put­ers are con­scious, they are con­scious at the hard­ware level

cube_flipper9 May 2026 15:08 UTC
41 points
34 comments19 min readLW link
(smoothbrains.net)

Does Opus 4.7 Gen­er­ate De­cep­tive De­nials About Its Own Guardrails?

usize9 May 2026 4:12 UTC
10 points
0 comments3 min readLW link
(usize.github.io)

Bad Prob­lems Don’t Stop Be­ing Bad Be­cause Some­body’s Wrong About Fault Analysis

Linch9 May 2026 1:30 UTC
131 points
31 comments3 min readLW link

We Should Have Manda­tory Me­dia/​Com­mu­ni­ca­tions Train­ing For All Communicators

Darren McKee8 May 2026 20:29 UTC
4 points
6 comments3 min readLW link

Chess as a pre­dic­tion model of the ar­tifi­cial in­tel­li­gence im­pact on cul­ture

8498 May 2026 20:19 UTC
−10 points
1 comment5 min readLW link
(lojkine.art)

The Sat­u­ra­tion View: some re­sponses

wdmacaskill8 May 2026 17:32 UTC
25 points
4 comments8 min readLW link

Is Pro­gramBench Im­pos­si­ble?

frmsaul8 May 2026 17:04 UTC
78 points
6 comments2 min readLW link