Vlad Mikulik

Karma: 744

Reasoning models don’t always say what they think

Joe Benton, Ethan Perez, Vlad Mikulik and Fabien Roger

Apr 9, 2025, 7:48 PM

25 points

4 comments1 min readLW link

(www.anthropic.com)

Automated Researchers Can Subtly Sandbag

gasteigerjo, Akbir Khan, Sam Bowman, Vlad Mikulik, Ethan Perez and Fabien Roger

Mar 26, 2025, 7:13 PM

41 points

0 comments4 min readLW link

(alignment.anthropic.com)

Discussion: Challenges with Unsupervised LLM Knowledge Discovery

Seb Farquhar, Vikrant Varma, zac_kenton, gasteigerjo, Vlad Mikulik and Rohin Shah

Dec 18, 2023, 11:58 AM

147 points

21 comments10 min readLW link

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Neel Nanda, Tom Lieberum, Matthew Rahtz, János Kramár, Geoffrey Irving, Rohin Shah and Vlad Mikulik

Jul 20, 2023, 10:50 AM

44 points

3 comments2 min readLW link

(arxiv.org)

Specification gaming: the flip side of AI ingenuity

Vika, Vlad Mikulik, Matthew Rahtz, tom4everitt, Zac Kenton and janleike

May 6, 2020, 11:51 PM

66 points

9 comments6 min readLW link

Utility ≠ Reward

Vlad MikulikSep 5, 2019, 5:28 PM

131 points

24 comments1 min readLW link 2 reviews

2-D Robustness

Vlad MikulikAug 30, 2019, 8:27 PM

85 points

8 comments2 min readLW link

Risks from Learned Optimization: Conclusion and Related Work

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 7, 2019, 7:53 PM

82 points

5 comments6 min readLW link

Deceptive Alignment

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 5, 2019, 8:16 PM

118 points

20 comments17 min readLW link

The Inner Alignment Problem

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 4, 2019, 1:20 AM

104 points

17 comments13 min readLW link

Conditions for Mesa-Optimization

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 1, 2019, 8:52 PM

84 points

48 comments12 min readLW link

Risks from Learned Optimization: Introduction

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

May 31, 2019, 11:44 PM

187 points

42 comments12 min readLW link 3 reviews

Clarifying Consequentialists in the Solomonoff Prior

Vlad MikulikJul 11, 2018, 2:35 AM

20 points

16 comments6 min readLW link