RSS

Tom Lieberum

Karma: 808

Research Engineer at DeepMind, focused on mechanistic interpretability and large language models. Opinions are my own.

Im­prov­ing Dic­tionary Learn­ing with Gated Sparse Autoencoders

25 Apr 2024 18:43 UTC
62 points
35 comments1 min readLW link
(arxiv.org)

[Full Post] Progress Up­date #1 from the GDM Mech In­terp Team

19 Apr 2024 19:06 UTC
71 points
8 comments8 min readLW link

[Sum­mary] Progress Up­date #1 from the GDM Mech In­terp Team

19 Apr 2024 19:06 UTC
68 points
0 comments3 min readLW link

AtP*: An effi­cient and scal­able method for lo­cal­iz­ing LLM be­havi­our to components

18 Mar 2024 17:28 UTC
19 points
0 comments1 min readLW link
(arxiv.org)

Does Cir­cuit Anal­y­sis In­ter­pretabil­ity Scale? Ev­i­dence from Mul­ti­ple Choice Ca­pa­bil­ities in Chinchilla

20 Jul 2023 10:50 UTC
44 points
3 comments2 min readLW link
(arxiv.org)

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of Grokking

15 Aug 2022 2:41 UTC
370 points
47 comments36 min readLW link1 review
(colab.research.google.com)

In­ves­ti­gat­ing causal un­der­stand­ing in LLMs

14 Jun 2022 13:57 UTC
28 points
6 comments13 min readLW link

Thoughts on For­mal­iz­ing Composition

Tom Lieberum7 Jun 2022 7:51 UTC
13 points
0 comments7 min readLW link

Un­der­stand­ing the ten­sor product for­mu­la­tion in Trans­former Circuits

Tom Lieberum24 Dec 2021 18:05 UTC
16 points
2 comments3 min readLW link

[Question] How should my timelines in­fluence my ca­reer choice?

Tom Lieberum3 Aug 2021 10:14 UTC
13 points
10 comments1 min readLW link