Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Vikrant Varma
Karma:
835
Research Engineer at DeepMind.
Publications
All
Posts
Comments
New
Top
Old
MONA: Three Month Later—Updates and Steganography Without Optimization Pressure
David Lindner
and
Vikrant Varma
12 Apr 2025 23:15 UTC
31
points
0
comments
5
min read
LW
link
JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan
,
Tom Lieberum
,
nps29
,
Arthur Conmy
,
Vikrant Varma
,
János Kramár
and
Neel Nanda
19 Jul 2024 16:10 UTC
55
points
10
comments
1
min read
LW
link
(storage.googleapis.com)
Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan
,
Arthur Conmy
,
lewis smith
,
Tom Lieberum
,
Vikrant Varma
,
János Kramár
,
Rohin Shah
and
Neel Nanda
25 Apr 2024 18:43 UTC
63
points
38
comments
1
min read
LW
link
(arxiv.org)
[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda
,
Arthur Conmy
,
lewis smith
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
and
Vikrant Varma
19 Apr 2024 19:06 UTC
80
points
10
comments
8
min read
LW
link
[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda
,
Arthur Conmy
,
lewis smith
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
and
Vikrant Varma
19 Apr 2024 19:06 UTC
73
points
0
comments
3
min read
LW
link
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Seb Farquhar
,
Vikrant Varma
,
zac_kenton
,
gasteigerjo
,
Vlad Mikulik
and
Rohin Shah
18 Dec 2023 11:58 UTC
149
points
21
comments
10
min read
LW
link
Explaining grokking through circuit efficiency
Vikrant Varma
and
Rohin Shah
8 Sep 2023 14:39 UTC
101
points
11
comments
3
min read
LW
link
(arxiv.org)
Refining the Sharp Left Turn threat model, part 2: applying alignment techniques
Vika
,
Vikrant Varma
,
Ramana Kumar
and
Rohin Shah
25 Nov 2022 14:36 UTC
39
points
9
comments
6
min read
LW
link
(vkrakovna.wordpress.com)
Threat Model Literature Review
zac_kenton
,
Rohin Shah
,
David Lindner
,
Vikrant Varma
,
Vika
,
Mary Phuong
,
Ramana Kumar
and
Elliot Catt
1 Nov 2022 11:03 UTC
79
points
4
comments
25
min read
LW
link
Clarifying AI X-risk
zac_kenton
,
Rohin Shah
,
David Lindner
,
Vikrant Varma
,
Vika
,
Mary Phuong
,
Ramana Kumar
and
Elliot Catt
1 Nov 2022 11:03 UTC
127
points
24
comments
4
min read
LW
link
1
review
More examples of goal misgeneralization
Rohin Shah
and
Vikrant Varma
7 Oct 2022 14:38 UTC
56
points
8
comments
2
min read
LW
link
(deepmindsafetyresearch.medium.com)
Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
Vika
,
Vikrant Varma
,
Ramana Kumar
and
Mary Phuong
12 Aug 2022 15:17 UTC
86
points
4
comments
3
min read
LW
link
1
review
(vkrakovna.wordpress.com)
ELK contest submission: route understanding through the human ontology
Vika
,
Ramana Kumar
and
Vikrant Varma
14 Mar 2022 21:42 UTC
21
points
2
comments
2
min read
LW
link
Back to top