RSS

lukemarks

Karma: 455

luke­marks’s Shortform

lukemarksJul 2, 2024, 6:56 AM
4 points
19 comments1 min readLW link

Beta Tester Re­quest: Ral­ly­point Bounties

lukemarksMay 25, 2024, 9:11 AM
34 points
4 comments1 min readLW link

[Question] Shouldn’t we ‘Just’ Su­per­im­i­tate Low-Res Uploads?

lukemarksNov 3, 2023, 7:42 AM
15 points
2 comments2 min readLW link

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

Oct 3, 2023, 7:45 AM
17 points
0 comments5 min readLW link

The Löbian Ob­sta­cle, And Why You Should Care

lukemarksSep 7, 2023, 11:59 PM
18 points
6 comments2 min readLW link

[Question] What Does LessWrong/​EA Think of Hu­man In­tel­li­gence Aug­men­ta­tion as of mid-2023?

lukemarksJul 8, 2023, 11:42 AM
84 points
28 comments2 min readLW link

Direct Prefer­ence Op­ti­miza­tion in One Minute

lukemarksJun 26, 2023, 11:52 AM
22 points
3 comments2 min readLW link

Par­tial Si­mu­la­tion Ex­trap­o­la­tion: A Pro­posal for Build­ing Safer Simulators

lukemarksJun 17, 2023, 1:55 PM
16 points
0 comments10 min readLW link

Higher Di­men­sion Carte­sian Ob­jects and Align­ing ‘Tiling Si­mu­la­tors’

lukemarksJun 11, 2023, 12:13 AM
22 points
0 comments5 min readLW link

The Se­cu­rity Mind­set, S-Risk and Pub­lish­ing Pro­saic Align­ment Research

lukemarksApr 22, 2023, 2:36 PM
39 points
7 comments5 min readLW link

Select Agent Speci­fi­ca­tions as Nat­u­ral Abstractions

lukemarksApr 7, 2023, 11:16 PM
19 points
3 comments5 min readLW link