RSS

leogao

Karma: 5,433

My takes on SB-1047

leogaoSep 9, 2024, 6:38 PM
151 points
8 comments4 min readLW link

Scal­ing and eval­u­at­ing sparse autoencoders

leogaoJun 6, 2024, 10:50 PM
106 points
6 comments1 min readLW link

Weak-to-Strong Gen­er­al­iza­tion: Elic­it­ing Strong Ca­pa­bil­ities With Weak Supervision

leogaoDec 16, 2023, 5:39 AM
55 points
5 comments1 min readLW link

Shap­ley Value At­tri­bu­tion in Chain of Thought

leogaoApr 14, 2023, 5:56 AM
106 points
7 comments4 min readLW link

[ASoT] Some thoughts on hu­man abstractions

leogaoMar 16, 2023, 5:42 AM
42 points
4 comments5 min readLW link

Clar­ify­ing wire­head­ing terminology

leogaoNov 24, 2022, 4:53 AM
66 points
6 comments1 min readLW link

Scal­ing Laws for Re­ward Model Overoptimization

Oct 20, 2022, 12:20 AM
103 points
13 comments1 min readLW link
(arxiv.org)

[Question] How many GPUs does NVIDIA make?

leogaoOct 8, 2022, 5:54 PM
27 points
2 comments1 min readLW link

Towards de­con­fus­ing wire­head­ing and re­ward maximization

leogaoSep 21, 2022, 12:36 AM
81 points
7 comments4 min readLW link

Hu­mans Reflect­ing on HRH

leogaoJul 29, 2022, 9:56 PM
26 points
4 comments2 min readLW link

leogao’s Shortform

leogaoMay 24, 2022, 8:08 PM
6 points
313 commentsLW link

[ASoT] Con­se­quen­tial­ist mod­els as a su­per­set of mesaoptimizers

leogaoApr 23, 2022, 5:57 PM
38 points
2 comments4 min readLW link

[ASoT] Some thoughts about im­perfect world modeling

leogaoApr 7, 2022, 3:42 PM
7 points
0 comments4 min readLW link

[ASoT] Some thoughts about LM monologue limi­ta­tions and ELK

leogaoMar 30, 2022, 2:26 PM
10 points
0 comments2 min readLW link

[ASoT] Some thoughts about de­cep­tive mesaoptimization

leogaoMar 28, 2022, 9:14 PM
24 points
5 comments7 min readLW link

[ASoT] Search­ing for con­se­quen­tial­ist structure

leogaoMar 27, 2022, 7:09 PM
26 points
2 comments4 min readLW link

[ASoT] Some ways ELK could still be solv­able in practice

leogaoMar 27, 2022, 1:15 AM
26 points
1 comment2 min readLW link

[ASoT] Ob­ser­va­tions about ELK

leogaoMar 26, 2022, 12:42 AM
34 points
0 comments3 min readLW link

What do paradigm shifts look like?

leogaoMar 16, 2022, 7:17 PM
18 points
2 comments1 min readLW link

EleutherAI’s GPT-NeoX-20B release

leogaoFeb 10, 2022, 6:56 AM
30 points
3 comments1 min readLW link
(eaidata.bmk.sh)