Tom Lieberum

Karma: 808

Research Engineer at DeepMind, focused on mechanistic interpretability and large language models. Opinions are my own.

Improving Dictionary Learning with Gated Sparse Autoencoders

Senthooran Rajamanoharan, Arthur Conmy, lsgos, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah and Neel Nanda

25 Apr 2024 18:43 UTC

62 points

35 comments1 min readLW link

(arxiv.org)

[Full Post] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lsgos, Senthooran Rajamanoharan, Tom Lieberum, János Kramár and Vikrant Varma

19 Apr 2024 19:06 UTC

71 points

8 comments8 min readLW link

[Summary] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lsgos, Senthooran Rajamanoharan, Tom Lieberum, János Kramár and Vikrant Varma

19 Apr 2024 19:06 UTC

68 points

0 comments3 min readLW link

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Neel Nanda, János Kramár, Tom Lieberum and Rohin Shah

18 Mar 2024 17:28 UTC

19 points

0 comments1 min readLW link

(arxiv.org)

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Neel Nanda, Tom Lieberum, Matthew Rahtz, János Kramár, Geoffrey Irving, Rohin Shah and Vlad Mikulik

20 Jul 2023 10:50 UTC

44 points

3 comments2 min readLW link

(arxiv.org)

A Mechanistic Interpretability Analysis of Grokking

Neel Nanda and Tom Lieberum

15 Aug 2022 2:41 UTC

370 points

47 comments36 min readLW link 1 review

(colab.research.google.com)

Investigating causal understanding in LLMs

Marius Hobbhahn and Tom Lieberum

14 Jun 2022 13:57 UTC

28 points

6 comments13 min readLW link

Thoughts on Formalizing Composition

Tom Lieberum7 Jun 2022 7:51 UTC

13 points

0 comments7 min readLW link

Understanding the tensor product formulation in Transformer Circuits

Tom Lieberum24 Dec 2021 18:05 UTC

16 points

2 comments3 min readLW link

[Question] How should my timelines influence my career choice?

Tom Lieberum3 Aug 2021 10:14 UTC

13 points

10 comments1 min readLW link