RSS

Marius Hobbhahn

Karma: 3,845

I’m the co-founder and CEO of Apollo Research: https://​​www.apolloresearch.ai/​​
I mostly work on evals, but I am also interested in interpretability. My goal is to improve our understanding of scheming and build tools and methods to detect it.

I previously did a Ph.D. in ML at the International Max-Planck research school in Tübingen, worked part-time with Epoch and did independent AI safety research.

For more see https://​​www.mariushobbhahn.com/​​aboutme/​​

I subscribe to Crocker’s Rules

Abla­tions for “Fron­tier Models are Ca­pable of In-con­text Schem­ing”

17 Dec 2024 23:58 UTC
89 points
1 comment2 min readLW link

Fron­tier Models are Ca­pable of In-con­text Scheming

5 Dec 2024 22:11 UTC
201 points
24 comments7 min readLW link

Train­ing AI agents to solve hard prob­lems could lead to Scheming

19 Nov 2024 0:10 UTC
61 points
12 comments28 min readLW link

Which evals re­sources would be good?

Marius Hobbhahn16 Nov 2024 14:24 UTC
47 points
4 comments5 min readLW link

The Evals Gap

Marius Hobbhahn11 Nov 2024 16:42 UTC
55 points
7 comments7 min readLW link
(www.apolloresearch.ai)

Toward Safety Cases For AI Scheming

31 Oct 2024 17:20 UTC
60 points
1 comment2 min readLW link

Im­prov­ing Model-Writ­ten Evals for AI Safety Benchmarking

15 Oct 2024 18:25 UTC
27 points
0 comments18 min readLW link

An Opinionated Evals Read­ing List

15 Oct 2024 14:38 UTC
65 points
0 comments13 min readLW link
(www.apolloresearch.ai)

An­a­lyz­ing Deep­Mind’s Prob­a­bil­is­tic Meth­ods for Eval­u­at­ing Agent Capabilities

22 Jul 2024 16:17 UTC
69 points
0 comments16 min readLW link