Marius Hobbhahn

Karma: 3,863

I’m the co-founder and CEO of Apollo Research: https://www.apolloresearch.ai/
I mostly work on evals, but I am also interested in interpretability. My goal is to improve our understanding of scheming and build tools and methods to detect it.

I previously did a Ph.D. in ML at the International Max-Planck research school in Tübingen, worked part-time with Epoch and did independent AI safety research.

For more see https://www.mariushobbhahn.com/aboutme/

I subscribe to Crocker’s Rules

Announcing Apollo Research

Marius Hobbhahn, beren, Lee Sharkey, Lucius Bushnaq, Dan Braun, Mikita Balesni and Jérémy Scheurer

May 30, 2023, 4:17 PM

217 points

11 comments8 min readLW link

Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2

StefanHex and Marius Hobbhahn

May 25, 2023, 3:37 PM

71 points

1 comment13 min readLW link

Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1

StefanHex and Marius Hobbhahn

May 9, 2023, 7:41 PM

119 points

1 comment10 min readLW link

Should we publish mechanistic interpretability research?

Marius Hobbhahn and LawrenceC

Apr 21, 2023, 4:19 PM

105 points

40 comments13 min readLW link

Clarifying mesa-optimization

Marius Hobbhahn and Pierre Peigné

Mar 21, 2023, 3:53 PM

38 points

6 comments10 min readLW link

Reflection Mechanisms as an Alignment Target—Attitudes on “near-term” AI

elandgre, Beth Barnes and Marius Hobbhahn

Mar 2, 2023, 4:29 AM

21 points

0 comments8 min readLW link

More findings on maximal data dimension

Marius HobbhahnFeb 2, 2023, 6:33 PM

27 points

1 comment11 min readLW link

More findings on Memorization and double descent

Marius HobbhahnFeb 1, 2023, 6:26 PM

53 points

2 comments19 min readLW link

The role of Bayesian ML in AI safety—an overview

Marius HobbhahnJan 27, 2023, 7:40 PM

31 points

6 comments10 min readLW link

The next decades might be wild

Marius HobbhahnDec 15, 2022, 4:10 PM

175 points

42 comments41 min readLW link 1 review

Predicting GPU performance

Marius Hobbhahn and Tamay

Dec 14, 2022, 4:27 PM

60 points

26 comments1 min readLW link

(epochai.org)

Theories of impact for Science of Deep Learning

Marius HobbhahnDec 1, 2022, 2:39 PM

24 points

0 comments11 min readLW link

Announcing AI safety Mentors and Mentees

Marius HobbhahnNov 23, 2022, 3:21 PM

62 points

7 comments10 min readLW link

Disagreement with bio anchors that lead to shorter timelines

Marius HobbhahnNov 16, 2022, 2:40 PM

75 points

17 comments7 min readLW link 1 review

Some advice on independent research

Marius HobbhahnNov 8, 2022, 2:46 PM

55 points

5 comments10 min readLW link

Science of Deep Learning—a technical agenda

Marius HobbhahnOct 18, 2022, 2:54 PM

36 points

7 comments4 min readLW link

Building a transformer from scratch—AI safety up-skilling challenge

Marius HobbhahnOct 12, 2022, 3:40 PM

42 points

1 comment5 min readLW link

Lessons learned from talking to >100 academics about AI safety

Marius HobbhahnOct 10, 2022, 1:16 PM

216 points

18 comments12 min readLW link 1 review

Reflection Mechanisms as an Alignment target: A follow-up survey

Marius Hobbhahn, elandgre and Beth Barnes

Oct 5, 2022, 2:03 PM

15 points

2 comments7 min readLW link

Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA

Marius HobbhahnOct 4, 2022, 7:22 AM

46 points

11 comments1 min readLW link

(arxiv.org)