RSS

johnswentworth

Karma: 54,632

Ori­ent­ing Toward Wizard Power

johnswentworthMay 8, 2025, 5:23 AM
425 points
96 comments5 min readLW link

$500 + $500 Bounty Prob­lem: Does An (Ap­prox­i­mately) Deter­minis­tic Max­i­mal Re­dund Always Ex­ist?

May 6, 2025, 11:05 PM
72 points
11 comments3 min readLW link

Mis­rep­re­sen­ta­tion as a Bar­rier for In­terp (Part I)

Apr 29, 2025, 5:07 PM
96 points
11 comments7 min readLW link

$500 Bounty Prob­lem: Are (Ap­prox­i­mately) Deter­minis­tic Nat­u­ral La­tents All You Need?

Apr 21, 2025, 8:19 PM
89 points
16 comments3 min readLW link

So You Want To Make Marginal Progress...

johnswentworthFeb 7, 2025, 11:22 PM
286 points
42 comments4 min readLW link

In­stru­men­tal Goals Are A Differ­ent And Friendlier Kind Of Thing Than Ter­mi­nal Goals

Jan 24, 2025, 8:20 PM
180 points
61 comments5 min readLW link

The Case Against AI Con­trol Research

johnswentworthJan 21, 2025, 4:03 PM
353 points
80 comments6 min readLW link

What Is The Align­ment Prob­lem?

johnswentworthJan 16, 2025, 1:20 AM
180 points
50 comments25 min readLW link

The Plan − 2024 Update

johnswentworthDec 31, 2024, 1:29 PM
117 points
28 comments4 min readLW link

The Field of AI Align­ment: A Post­mortem, and What To Do About It

johnswentworthDec 26, 2024, 6:48 PM
297 points
160 comments8 min readLW link

[Question] What Have Been Your Most Valuable Ca­sual Con­ver­sa­tions At Con­fer­ences?

johnswentworthDec 25, 2024, 5:49 AM
54 points
21 comments1 min readLW link

The Me­dian Re­searcher Problem

johnswentworthNov 2, 2024, 8:16 PM
157 points
70 comments1 min readLW link

Three No­tions of “Power”

johnswentworthOct 30, 2024, 6:10 AM
92 points
44 comments4 min readLW link

In­for­ma­tion vs Assurance

johnswentworthOct 20, 2024, 11:16 PM
187 points
17 comments2 min readLW link

Min­i­mal Mo­ti­va­tion of Nat­u­ral Latents

Oct 14, 2024, 10:51 PM
46 points
14 comments3 min readLW link

Values Are Real Like Harry Potter

Oct 9, 2024, 11:42 PM
86 points
21 comments5 min readLW link

We Don’t Know Our Own Values, but Re­ward Bridges The Is-Ought Gap

Sep 19, 2024, 10:22 PM
48 points
48 comments5 min readLW link

Why Large Bureau­cratic Or­ga­ni­za­tions?

johnswentworthAug 27, 2024, 6:30 PM
68 points
52 comments12 min readLW link

… Wait, our mod­els of se­man­tics should in­form fluid me­chan­ics?!?

Aug 26, 2024, 4:38 PM
59 points
18 comments4 min readLW link

In­ter­op­er­a­ble High Level Struc­tures: Early Thoughts on Adjectives

Aug 22, 2024, 9:12 PM
49 points
1 comment7 min readLW link