RSS

Daniel Tan

Karma: 1,269

AI alignment researcher. Interested in understanding reasoning in language models.

https://​​dtch1997.github.io/​​

Open Challenges in Rep­re­sen­ta­tion Engineering

Apr 3, 2025, 7:21 PM
12 points
0 comments5 min readLW link

Show, not tell: GPT-4o is more opinionated in images than in text

Apr 2, 2025, 8:51 AM
76 points
19 comments3 min readLW link

Open prob­lems in emer­gent misalignment

Mar 1, 2025, 9:47 AM
77 points
13 comments7 min readLW link

A Col­lec­tion of Em­piri­cal Frames about Lan­guage Models

Daniel TanJan 2, 2025, 2:49 AM
27 points
0 comments3 min readLW link

Why I’m Mov­ing from Mechanis­tic to Pro­saic Interpretability

Daniel TanDec 30, 2024, 6:35 AM
113 points
34 comments5 min readLW link

A Sober Look at Steer­ing Vec­tors for LLMs

Nov 23, 2024, 5:30 PM
38 points
0 comments5 min readLW link

Evolu­tion­ary prompt op­ti­miza­tion for SAE fea­ture visualization

Nov 14, 2024, 1:06 PM
21 points
0 comments9 min readLW link

An In­ter­pretabil­ity Illu­sion from Pop­u­la­tion Statis­tics in Causal Analysis

Daniel TanJul 29, 2024, 2:50 PM
9 points
3 comments1 min readLW link

Daniel Tan’s Shortform

Daniel TanJul 17, 2024, 6:38 AM
2 points
257 comments1 min readLW link

Mech In­terp Lacks Good Paradigms

Daniel TanJul 16, 2024, 3:47 PM
39 points
0 comments14 min readLW link

Ac­ti­va­tion Pat­tern SVD: A pro­posal for SAE Interpretability

Daniel TanJun 28, 2024, 10:12 PM
15 points
2 comments2 min readLW link