RSS

Sorry, I still think kid­ney dona­tion makes no sense for an EA

nicholashalden22 Nov 2025 18:10 UTC
−1 points
1 comment1 min readLW link
(substack.com)

Au­to­matic alt text generation

TurnTrout22 Nov 2025 17:57 UTC
18 points
1 comment1 min readLW link
(turntrout.com)

In­tro­spec­tion in LLMs: A Pro­posal For How To Think About It, And Test For It

Christopher Ackerman22 Nov 2025 14:52 UTC
5 points
0 comments7 min readLW link

Book Re­view: Wizard’s Hall

Screwtape22 Nov 2025 7:38 UTC
52 points
0 comments5 min readLW link

Be Naughty

habryka22 Nov 2025 6:35 UTC
57 points
3 comments4 min readLW link

Mar­ket Logic I

abramdemski22 Nov 2025 6:01 UTC
31 points
2 comments5 min readLW link

An­i­mal welfare con­cerns are dom­i­nated by post-ASI futures

RobertM22 Nov 2025 4:08 UTC
23 points
0 comments4 min readLW link

Ha­bit­ual men­tal mo­tions might ex­plain why peo­ple are con­tent to get old and die

Ruby22 Nov 2025 2:52 UTC
16 points
1 comment7 min readLW link

Di­plo­macy dur­ing AI takeoff

Nikola Jurkovic22 Nov 2025 2:12 UTC
15 points
3 comments2 min readLW link
(nikolajurkovic.substack.com)

Ab­stract ad­vice to re­searchers tack­ling the difficult core prob­lems of AGI alignment

TsviBT22 Nov 2025 0:53 UTC
90 points
8 comments8 min readLW link

Why Not Just Train For In­ter­pretabil­ity?

johnswentworth21 Nov 2025 22:08 UTC
41 points
5 comments4 min readLW link

Models not mak­ing it clear when they’re role­play­ing seems like a fairly big issue

williawa21 Nov 2025 20:23 UTC
16 points
0 comments6 min readLW link

Nat­u­ral Emer­gent Misal­ign­ment from Re­ward Hacking

Algon21 Nov 2025 20:20 UTC
12 points
0 comments3 min readLW link
(www.anthropic.com)

Nat­u­ral emer­gent mis­al­ign­ment from re­ward hack­ing in pro­duc­tion RL

21 Nov 2025 20:00 UTC
175 points
17 comments9 min readLW link

Eight Heuris­tics of Anti-Epistemology

Ben Pace21 Nov 2025 19:54 UTC
21 points
2 comments6 min readLW link

We won’t solve non-al­ign­ment prob­lems by do­ing research

MichaelDickens21 Nov 2025 18:03 UTC
12 points
2 comments4 min readLW link

Can Ar­tifi­cial In­tel­li­gence Be Con­scious?

Bentham's Bulldog21 Nov 2025 16:43 UTC
12 points
3 comments7 min readLW link

Why Does Em­pa­thy Have an Off-Switch?

J Bostock21 Nov 2025 14:56 UTC
9 points
1 comment7 min readLW link

What Do We Tell the Hu­mans? Er­rors, Hal­lu­ci­na­tions, and Lies in the AI Village

Shoshannah Tekofsky21 Nov 2025 14:19 UTC
42 points
0 comments8 min readLW link

Should I Ap­ply to a 3.5% Ac­cep­tance-Rate Fel­low­ship? A Sim­ple EV Calculator

Tobias H21 Nov 2025 10:59 UTC
15 points
0 comments5 min readLW link