RSS

Vikrant Varma

Karma: 796

Research Engineer at DeepMind.

Publications

JumpReLU SAEs + Early Ac­cess to Gemma 2 SAEs

Jul 19, 2024, 4:10 PM
48 points
10 comments1 min readLW link
(storage.googleapis.com)

Im­prov­ing Dic­tionary Learn­ing with Gated Sparse Autoencoders

Apr 25, 2024, 6:43 PM
63 points
38 comments1 min readLW link
(arxiv.org)

[Full Post] Progress Up­date #1 from the GDM Mech In­terp Team

Apr 19, 2024, 7:06 PM
79 points
10 comments8 min readLW link

[Sum­mary] Progress Up­date #1 from the GDM Mech In­terp Team

Apr 19, 2024, 7:06 PM
72 points
0 comments3 min readLW link

Dis­cus­sion: Challenges with Un­su­per­vised LLM Knowl­edge Discovery

Dec 18, 2023, 11:58 AM
147 points
21 comments10 min readLW link

Ex­plain­ing grokking through cir­cuit efficiency

Sep 8, 2023, 2:39 PM
101 points
11 comments3 min readLW link
(arxiv.org)

Refin­ing the Sharp Left Turn threat model, part 2: ap­ply­ing al­ign­ment techniques

Nov 25, 2022, 2:36 PM
39 points
9 comments6 min readLW link
(vkrakovna.wordpress.com)

Threat Model Liter­a­ture Review

Nov 1, 2022, 11:03 AM
78 points
4 comments25 min readLW link

Clar­ify­ing AI X-risk

Nov 1, 2022, 11:03 AM
127 points
24 comments4 min readLW link1 review

More ex­am­ples of goal misgeneralization

Oct 7, 2022, 2:38 PM
56 points
8 comments2 min readLW link
(deepmindsafetyresearch.medium.com)

Refin­ing the Sharp Left Turn threat model, part 1: claims and mechanisms

Aug 12, 2022, 3:17 PM
86 points
4 comments3 min readLW link1 review
(vkrakovna.wordpress.com)

ELK con­test sub­mis­sion: route un­der­stand­ing through the hu­man ontology

Mar 14, 2022, 9:42 PM
21 points
2 comments2 min readLW link