RSS

Marius Hobbhahn

Karma: 4,976

I’m the co-founder and CEO of Apollo Research: https://​​www.apolloresearch.ai/​​
My goal is to improve our understanding of scheming and build tools and methods to detect and mitigate it.

I previously did a Ph.D. in ML at the International Max-Planck research school in Tübingen, worked part-time with Epoch and did independent AI safety research.

For more see https://​​www.mariushobbhahn.com/​​aboutme/​​

I subscribe to Crocker’s Rules

100+ con­crete pro­jects and open prob­lems in evals

Marius HobbhahnMar 22, 2025, 3:21 PM
71 points
1 comment1 min readLW link

Claude Son­net 3.7 (of­ten) knows when it’s in al­ign­ment evaluations

Mar 17, 2025, 7:11 PM
175 points
7 comments6 min readLW link

We should start look­ing for schem­ing “in the wild”

Marius HobbhahnMar 6, 2025, 1:49 PM
89 points
4 comments5 min readLW link

For schem­ing, we should first fo­cus on de­tec­tion and then on prevention

Marius HobbhahnMar 4, 2025, 3:22 PM
47 points
7 comments5 min readLW link

Fore­cast­ing Fron­tier Lan­guage Model Agent Capabilities

Feb 24, 2025, 4:51 PM
35 points
0 comments5 min readLW link
(www.apolloresearch.ai)

Do mod­els know when they are be­ing eval­u­ated?

Feb 17, 2025, 11:13 PM
54 points
3 comments12 min readLW link

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

Feb 6, 2025, 3:46 PM
102 points
9 comments2 min readLW link
(arxiv.org)

Catas­tro­phe through Chaos

Marius HobbhahnJan 31, 2025, 2:19 PM
182 points
17 comments12 min readLW link

What’s the short timeline plan?

Marius HobbhahnJan 2, 2025, 2:59 PM
347 points
49 comments23 min readLW link

Abla­tions for “Fron­tier Models are Ca­pable of In-con­text Schem­ing”

Dec 17, 2024, 11:58 PM
115 points
1 comment2 min readLW link