RSS

Marius Hobbhahn

Karma: 3,863

I’m the co-founder and CEO of Apollo Research: https://​​www.apolloresearch.ai/​​
I mostly work on evals, but I am also interested in interpretability. My goal is to improve our understanding of scheming and build tools and methods to detect it.

I previously did a Ph.D. in ML at the International Max-Planck research school in Tübingen, worked part-time with Epoch and did independent AI safety research.

For more see https://​​www.mariushobbhahn.com/​​aboutme/​​

I subscribe to Crocker’s Rules

An­nounc­ing Apollo Research

May 30, 2023, 4:17 PM
217 points
11 comments8 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 2

May 25, 2023, 3:37 PM
71 points
1 comment13 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 1

May 9, 2023, 7:41 PM
119 points
1 comment10 min readLW link

Should we pub­lish mechanis­tic in­ter­pretabil­ity re­search?

Apr 21, 2023, 4:19 PM
105 points
40 comments13 min readLW link

Clar­ify­ing mesa-optimization

Mar 21, 2023, 3:53 PM
38 points
6 comments10 min readLW link

Reflec­tion Mechanisms as an Align­ment Tar­get—At­ti­tudes on “near-term” AI

Mar 2, 2023, 4:29 AM
21 points
0 comments8 min readLW link

More find­ings on max­i­mal data dimension

Marius HobbhahnFeb 2, 2023, 6:33 PM
27 points
1 comment11 min readLW link

More find­ings on Me­moriza­tion and dou­ble descent

Marius HobbhahnFeb 1, 2023, 6:26 PM
53 points
2 comments19 min readLW link

The role of Bayesian ML in AI safety—an overview

Marius HobbhahnJan 27, 2023, 7:40 PM
31 points
6 comments10 min readLW link

The next decades might be wild

Marius HobbhahnDec 15, 2022, 4:10 PM
175 points
42 comments41 min readLW link1 review

Pre­dict­ing GPU performance

Dec 14, 2022, 4:27 PM
60 points
26 comments1 min readLW link
(epochai.org)

The­o­ries of im­pact for Science of Deep Learning

Marius HobbhahnDec 1, 2022, 2:39 PM
24 points
0 comments11 min readLW link

An­nounc­ing AI safety Men­tors and Mentees

Marius HobbhahnNov 23, 2022, 3:21 PM
62 points
7 comments10 min readLW link

Disagree­ment with bio an­chors that lead to shorter timelines

Marius HobbhahnNov 16, 2022, 2:40 PM
75 points
17 comments7 min readLW link1 review

Some ad­vice on in­de­pen­dent research

Marius HobbhahnNov 8, 2022, 2:46 PM
55 points
5 comments10 min readLW link

Science of Deep Learn­ing—a tech­ni­cal agenda

Marius HobbhahnOct 18, 2022, 2:54 PM
36 points
7 comments4 min readLW link

Build­ing a trans­former from scratch—AI safety up-skil­ling challenge

Marius HobbhahnOct 12, 2022, 3:40 PM
42 points
1 comment5 min readLW link

Les­sons learned from talk­ing to >100 aca­demics about AI safety

Marius HobbhahnOct 10, 2022, 1:16 PM
216 points
18 comments12 min readLW link1 review

Reflec­tion Mechanisms as an Align­ment tar­get: A fol­low-up survey

Oct 5, 2022, 2:03 PM
15 points
2 comments7 min readLW link

Paper+Sum­mary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA

Marius HobbhahnOct 4, 2022, 7:22 AM
46 points
11 comments1 min readLW link
(arxiv.org)