Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
leogao
Karma:
5,433
All
Posts
Comments
New
Top
Old
Page
1
My takes on SB-1047
leogao
Sep 9, 2024, 6:38 PM
151
points
8
comments
4
min read
LW
link
Scaling and evaluating sparse autoencoders
leogao
Jun 6, 2024, 10:50 PM
106
points
6
comments
1
min read
LW
link
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
leogao
Dec 16, 2023, 5:39 AM
55
points
5
comments
1
min read
LW
link
Shapley Value Attribution in Chain of Thought
leogao
Apr 14, 2023, 5:56 AM
106
points
7
comments
4
min read
LW
link
[ASoT] Some thoughts on human abstractions
leogao
Mar 16, 2023, 5:42 AM
42
points
4
comments
5
min read
LW
link
Clarifying wireheading terminology
leogao
Nov 24, 2022, 4:53 AM
66
points
6
comments
1
min read
LW
link
Scaling Laws for Reward Model Overoptimization
leogao
,
John Schulman
and
Jacob_Hilton
Oct 20, 2022, 12:20 AM
103
points
13
comments
1
min read
LW
link
(arxiv.org)
[Question]
How many GPUs does NVIDIA make?
leogao
Oct 8, 2022, 5:54 PM
27
points
2
comments
1
min read
LW
link
Towards deconfusing wireheading and reward maximization
leogao
Sep 21, 2022, 12:36 AM
81
points
7
comments
4
min read
LW
link
Humans Reflecting on HRH
leogao
Jul 29, 2022, 9:56 PM
26
points
4
comments
2
min read
LW
link
leogao’s Shortform
leogao
May 24, 2022, 8:08 PM
6
points
313
comments
LW
link
[ASoT] Consequentialist models as a superset of mesaoptimizers
leogao
Apr 23, 2022, 5:57 PM
38
points
2
comments
4
min read
LW
link
[ASoT] Some thoughts about imperfect world modeling
leogao
Apr 7, 2022, 3:42 PM
7
points
0
comments
4
min read
LW
link
[ASoT] Some thoughts about LM monologue limitations and ELK
leogao
Mar 30, 2022, 2:26 PM
10
points
0
comments
2
min read
LW
link
[ASoT] Some thoughts about deceptive mesaoptimization
leogao
Mar 28, 2022, 9:14 PM
24
points
5
comments
7
min read
LW
link
[ASoT] Searching for consequentialist structure
leogao
Mar 27, 2022, 7:09 PM
26
points
2
comments
4
min read
LW
link
[ASoT] Some ways ELK could still be solvable in practice
leogao
Mar 27, 2022, 1:15 AM
26
points
1
comment
2
min read
LW
link
[ASoT] Observations about ELK
leogao
Mar 26, 2022, 12:42 AM
34
points
0
comments
3
min read
LW
link
What do paradigm shifts look like?
leogao
Mar 16, 2022, 7:17 PM
18
points
2
comments
1
min read
LW
link
EleutherAI’s GPT-NeoX-20B release
leogao
Feb 10, 2022, 6:56 AM
30
points
3
comments
1
min read
LW
link
(eaidata.bmk.sh)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel