Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
leogao
Karma:
2,884
All
Posts
Comments
New
Top
Old
Page
1
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
leogao
16 Dec 2023 5:39 UTC
53
points
5
comments
1
min read
LW
link
Shapley Value Attribution in Chain of Thought
leogao
14 Apr 2023 5:56 UTC
101
points
5
comments
4
min read
LW
link
[ASoT] Some thoughts on human abstractions
leogao
16 Mar 2023 5:42 UTC
42
points
4
comments
5
min read
LW
link
Clarifying wireheading terminology
leogao
24 Nov 2022 4:53 UTC
65
points
6
comments
1
min read
LW
link
Scaling Laws for Reward Model Overoptimization
leogao
,
John Schulman
and
Jacob_Hilton
20 Oct 2022 0:20 UTC
102
points
13
comments
1
min read
LW
link
(arxiv.org)
[Question]
How many GPUs does NVIDIA make?
leogao
8 Oct 2022 17:54 UTC
27
points
2
comments
1
min read
LW
link
Towards deconfusing wireheading and reward maximization
leogao
21 Sep 2022 0:36 UTC
81
points
7
comments
4
min read
LW
link
Humans Reflecting on HRH
leogao
29 Jul 2022 21:56 UTC
26
points
4
comments
2
min read
LW
link
leogao’s Shortform
leogao
24 May 2022 20:08 UTC
5
points
123
comments
1
min read
LW
link
[ASoT] Consequentialist models as a superset of mesaoptimizers
leogao
23 Apr 2022 17:57 UTC
37
points
2
comments
4
min read
LW
link
[ASoT] Some thoughts about imperfect world modeling
leogao
7 Apr 2022 15:42 UTC
7
points
0
comments
4
min read
LW
link
[ASoT] Some thoughts about LM monologue limitations and ELK
leogao
30 Mar 2022 14:26 UTC
10
points
0
comments
2
min read
LW
link
[ASoT] Some thoughts about deceptive mesaoptimization
leogao
28 Mar 2022 21:14 UTC
24
points
5
comments
7
min read
LW
link
[ASoT] Searching for consequentialist structure
leogao
27 Mar 2022 19:09 UTC
26
points
2
comments
4
min read
LW
link
[ASoT] Some ways ELK could still be solvable in practice
leogao
27 Mar 2022 1:15 UTC
26
points
1
comment
2
min read
LW
link
[ASoT] Observations about ELK
leogao
26 Mar 2022 0:42 UTC
31
points
0
comments
3
min read
LW
link
What do paradigm shifts look like?
leogao
16 Mar 2022 19:17 UTC
18
points
2
comments
1
min read
LW
link
EleutherAI’s GPT-NeoX-20B release
leogao
10 Feb 2022 6:56 UTC
30
points
3
comments
1
min read
LW
link
(eaidata.bmk.sh)
NFTs, Coin Collecting, and Expensive Paintings
leogao
24 Jan 2022 1:01 UTC
29
points
35
comments
5
min read
LW
link
Retail Investor Advantages
leogao
7 Dec 2021 2:08 UTC
13
points
13
comments
1
min read
LW
link
Back to top
Next