RSS

David Udell

Karma: 2,550

Why Can’t We Hy­poth­e­size After the Fact?

David UdellFeb 26, 2025, 10:41 PM
40 points
3 comments2 min readLW link

Causal Graphs of GPT-2-Small’s Resi­d­ual Stream

David UdellJul 9, 2024, 10:06 PM
53 points
7 comments7 min readLW link

Sparse Cod­ing, for Mechanis­tic In­ter­pretabil­ity and Ac­ti­va­tion Engineering

David UdellSep 23, 2023, 7:16 PM
42 points
7 comments34 min readLW link

Ac­tAdd: Steer­ing Lan­guage Models with­out Optimization

Sep 6, 2023, 5:21 PM
105 points
3 comments2 min readLW link
(arxiv.org)

Steer­ing GPT-2-XL by adding an ac­ti­va­tion vector

May 13, 2023, 6:42 PM
437 points
98 comments50 min readLW link1 review

Un­der­stand­ing and con­trol­ling a maze-solv­ing policy network

Mar 11, 2023, 6:59 PM
334 points
28 comments23 min readLW link

Be­neath My Epistemic Dignity

David UdellFeb 28, 2023, 4:02 AM
6 points
3 comments2 min readLW link

Prob­a­bil­ity The­ory: The Logic of Science, Jaynes

David UdellFeb 16, 2023, 9:57 PM
29 points
0 comments18 min readLW link

Round­ing Some­one Off

David UdellJan 24, 2023, 12:03 AM
25 points
0 comments5 min readLW link

Con­se­quen­tial­ists: One-Way Pat­tern Traps

David UdellJan 16, 2023, 8:48 PM
59 points
3 comments14 min readLW link

Lin­ear Alge­bra Done Right, Axler

David UdellJan 2, 2023, 10:54 PM
57 points
6 comments9 min readLW link

Naive Set The­ory, Halmos

David UdellDec 22, 2022, 2:34 AM
11 points
1 comment8 min readLW link

Moorean Statements

David UdellOct 22, 2022, 12:50 AM
11 points
11 comments1 min readLW link

Dath Ilan’s Views on Stop­gap Corrigibility

David UdellSep 22, 2022, 4:16 PM
78 points
19 comments13 min readLW link
(www.glowfic.com)

Guidelines for Mad Entrepreneurs

David UdellSep 16, 2022, 6:33 AM
31 points
0 comments11 min readLW link

Fram­ing AI Childhoods

David UdellSep 6, 2022, 11:40 PM
37 points
8 comments4 min readLW link

The Shard The­ory Align­ment Scheme

David Udell25 Aug 2022 4:52 UTC
47 points
32 comments2 min readLW link

“What Mis­takes Are You Mak­ing Right Now?”

David Udell15 Aug 2022 21:19 UTC
13 points
2 comments1 min readLW link

Shard The­ory: An Overview

David Udell11 Aug 2022 5:44 UTC
166 points
34 comments10 min readLW link

Team Shard Sta­tus Report

David Udell9 Aug 2022 5:33 UTC
38 points
8 comments3 min readLW link