RSS

bilalchughtai

Karma: 268

bilalchugh­tai’s Shortform

bilalchughtai29 Jul 2024 18:57 UTC
3 points
5 comments1 min readLW link

Un­der­stand­ing Po­si­tional Fea­tures in Layer 0 SAEs

29 Jul 2024 9:36 UTC
43 points
0 comments5 min readLW link

Un­learn­ing via RMU is mostly shallow

23 Jul 2024 16:07 UTC
50 points
3 comments6 min readLW link

Trans­former Cir­cuit Faith­ful­ness Met­rics Are Not Robust

12 Jul 2024 3:47 UTC
104 points
5 comments7 min readLW link
(arxiv.org)

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

8 Jul 2024 22:24 UTC
103 points
28 comments5 min readLW link