RSS

Vikrant Varma

Karma: 790

Research Engineer at DeepMind.

Publications

JumpReLU SAEs + Early Ac­cess to Gemma 2 SAEs

19 Jul 2024 16:10 UTC
48 points
10 comments1 min readLW link
(storage.googleapis.com)

Im­prov­ing Dic­tionary Learn­ing with Gated Sparse Autoencoders

25 Apr 2024 18:43 UTC
63 points
38 comments1 min readLW link
(arxiv.org)

[Full Post] Progress Up­date #1 from the GDM Mech In­terp Team

19 Apr 2024 19:06 UTC
77 points
10 comments8 min readLW link

[Sum­mary] Progress Up­date #1 from the GDM Mech In­terp Team

19 Apr 2024 19:06 UTC
72 points
0 comments3 min readLW link

Dis­cus­sion: Challenges with Un­su­per­vised LLM Knowl­edge Discovery

18 Dec 2023 11:58 UTC
147 points
21 comments10 min readLW link

Ex­plain­ing grokking through cir­cuit efficiency

8 Sep 2023 14:39 UTC
101 points
11 comments3 min readLW link
(arxiv.org)

Refin­ing the Sharp Left Turn threat model, part 2: ap­ply­ing al­ign­ment techniques

25 Nov 2022 14:36 UTC
39 points
9 comments6 min readLW link
(vkrakovna.wordpress.com)

Threat Model Liter­a­ture Review

1 Nov 2022 11:03 UTC
77 points
4 comments25 min readLW link

Clar­ify­ing AI X-risk

1 Nov 2022 11:03 UTC
127 points
24 comments4 min readLW link1 review

More ex­am­ples of goal misgeneralization

7 Oct 2022 14:38 UTC
56 points
8 comments2 min readLW link
(deepmindsafetyresearch.medium.com)

Refin­ing the Sharp Left Turn threat model, part 1: claims and mechanisms

12 Aug 2022 15:17 UTC
86 points
4 comments3 min readLW link1 review
(vkrakovna.wordpress.com)

ELK con­test sub­mis­sion: route un­der­stand­ing through the hu­man ontology

14 Mar 2022 21:42 UTC
21 points
2 comments2 min readLW link