Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
David Udell
Karma:
2,549
All
Posts
Comments
New
Top
Old
Page
1
Why Can’t We Hypothesize After the Fact?
David Udell
Feb 26, 2025, 10:41 PM
40
points
3
comments
2
min read
LW
link
Causal Graphs of GPT-2-Small’s Residual Stream
David Udell
Jul 9, 2024, 10:06 PM
53
points
7
comments
7
min read
LW
link
Sparse Coding, for Mechanistic Interpretability and Activation Engineering
David Udell
Sep 23, 2023, 7:16 PM
42
points
7
comments
34
min read
LW
link
ActAdd: Steering Language Models without Optimization
technicalities
,
TurnTrout
,
lisathiergart
,
David Udell
,
Ulisse Mini
and
Monte M
Sep 6, 2023, 5:21 PM
105
points
3
comments
2
min read
LW
link
(arxiv.org)
Steering GPT-2-XL by adding an activation vector
TurnTrout
,
Monte M
,
David Udell
,
lisathiergart
and
Ulisse Mini
May 13, 2023, 6:42 PM
437
points
98
comments
50
min read
LW
link
1
review
Understanding and controlling a maze-solving policy network
TurnTrout
,
peligrietzer
,
Ulisse Mini
,
Monte M
and
David Udell
Mar 11, 2023, 6:59 PM
333
points
28
comments
23
min read
LW
link
Beneath My Epistemic Dignity
David Udell
Feb 28, 2023, 4:02 AM
6
points
3
comments
2
min read
LW
link
Probability Theory: The Logic of Science, Jaynes
David Udell
Feb 16, 2023, 9:57 PM
29
points
0
comments
18
min read
LW
link
Rounding Someone Off
David Udell
Jan 24, 2023, 12:03 AM
25
points
0
comments
5
min read
LW
link
Consequentialists: One-Way Pattern Traps
David Udell
Jan 16, 2023, 8:48 PM
59
points
3
comments
14
min read
LW
link
Linear Algebra Done Right, Axler
David Udell
Jan 2, 2023, 10:54 PM
57
points
6
comments
9
min read
LW
link
Naive Set Theory, Halmos
David Udell
Dec 22, 2022, 2:34 AM
11
points
1
comment
8
min read
LW
link
Moorean Statements
David Udell
Oct 22, 2022, 12:50 AM
11
points
11
comments
1
min read
LW
link
Dath Ilan’s Views on Stopgap Corrigibility
David Udell
Sep 22, 2022, 4:16 PM
78
points
19
comments
13
min read
LW
link
(www.glowfic.com)
Guidelines for Mad Entrepreneurs
David Udell
Sep 16, 2022, 6:33 AM
31
points
0
comments
11
min read
LW
link
Framing AI Childhoods
David Udell
Sep 6, 2022, 11:40 PM
37
points
8
comments
4
min read
LW
link
The Shard Theory Alignment Scheme
David Udell
Aug 25, 2022, 4:52 AM
47
points
32
comments
2
min read
LW
link
“What Mistakes Are You Making Right Now?”
David Udell
Aug 15, 2022, 9:19 PM
13
points
2
comments
1
min read
LW
link
Shard Theory: An Overview
David Udell
Aug 11, 2022, 5:44 AM
166
points
34
comments
10
min read
LW
link
Team Shard Status Report
David Udell
Aug 9, 2022, 5:33 AM
38
points
8
comments
3
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel