RSS

Re­search agenda: In­ter­pre­tive debate

Shi18 Jun 2026 23:46 UTC
18 points
0 comments7 min readLW link

Does it feel any differ­ent to be re­verse-chiral life?

jessicata18 Jun 2026 22:56 UTC
9 points
0 comments10 min readLW link

Re­in­force­ment learn­ing to­wards broadly and per­sis­tently benefi­cial models

papetoast18 Jun 2026 22:11 UTC
11 points
0 comments1 min readLW link
(alignment.openai.com)

The dis­til­la­tion dou­ble bind: Distill­ing mis­al­igned mod­els ei­ther trans­fers mis­al­ign­ment or it doesn’t

18 Jun 2026 21:21 UTC
46 points
2 comments5 min readLW link
(blog.redwoodresearch.org)

CoT-forc­ing promptware

Bruce Middleton18 Jun 2026 19:33 UTC
2 points
0 comments2 min readLW link

AI that rep­re­sents you can’t be neu­tral.

agulaya2418 Jun 2026 18:50 UTC
−5 points
2 comments3 min readLW link

On “Model Or­ganisms”

J Bostock18 Jun 2026 18:42 UTC
18 points
1 comment6 min readLW link

In­tro­duc­tion: Gaus­sian Nat­u­ral Latents

Haru18 Jun 2026 18:41 UTC
17 points
1 comment3 min readLW link

Con­tra Pace on When to Apologize

Zack_M_Davis18 Jun 2026 16:49 UTC
32 points
10 comments6 min readLW link
(zackmdavis.net)

Your Model Or­ganisms Might Be Fried

18 Jun 2026 16:18 UTC
66 points
3 comments7 min readLW link

Shard nar­cis­sism as delu­sion of unembededness

Fernand018 Jun 2026 14:29 UTC
9 points
1 comment4 min readLW link

War of Dots: CRUSHING my op­po­nents with FACTS and LOGIC

momom218 Jun 2026 12:07 UTC
17 points
2 comments7 min readLW link

How far do open weights trail the fron­tier?

RobinHa18 Jun 2026 11:01 UTC
23 points
3 comments1 min readLW link
(robinhaselhorst.com)

GLM 5.2 play­ing text adventures

kqr18 Jun 2026 7:23 UTC
13 points
1 comment1 min readLW link
(entropicthoughts.com)

Lev­er­aged on be­ing right

Ben Pace, the Vacationing Vagabond18 Jun 2026 6:51 UTC
40 points
6 comments3 min readLW link

Vuln­er­a­bil­ities and ex­ploits: where are we headed?

tchauvin18 Jun 2026 5:49 UTC
9 points
0 comments5 min readLW link
(tchauvin.com)

Agents are un­der-elic­ited: A case study in op­ti­miza­tion tasks

18 Jun 2026 2:39 UTC
17 points
1 comment7 min readLW link
(fulcrum.inc)

A pre­limi­nary ex­per­i­ment re­gard­ing con­sis­tency as a mea­sure of con­cep­tual abil­ities in lan­guage models

Chi Nguyen17 Jun 2026 22:56 UTC
16 points
3 comments7 min readLW link
(casparoesterheld.com)

Gears for poli­ti­cal races

Tom Smith17 Jun 2026 20:19 UTC
133 points
14 comments14 min readLW link

“Did you lie?” Eval­u­at­ing Lie De­tec­tors across Model Scale and Belief-Ver­ified Model Organisms

17 Jun 2026 18:43 UTC
30 points
0 comments6 min readLW link
(arxiv.org)