RSS

Yeu-Tong Lau

Karma: 52

Un­der­stand­ing Po­si­tional Fea­tures in Layer 0 SAEs

29 Jul 2024 9:36 UTC
43 points
0 comments5 min readLW link

An ad­ver­sar­ial ex­am­ple for Direct Logit At­tri­bu­tion: mem­ory man­age­ment in gelu-4l

30 Aug 2023 17:36 UTC
17 points
0 comments8 min readLW link
(arxiv.org)