I’m the co-founder and CEO of Apollo Research: https://www.apolloresearch.ai/
My goal is to improve our understanding of scheming and build tools and methods to detect and mitigate it.
I previously did a Ph.D. in ML at the International Max-Planck research school in Tübingen, worked part-time with Epoch and did independent AI safety research.
For more see https://www.mariushobbhahn.com/aboutme/
I subscribe to Crocker’s Rules
Good point!
Yes, I use the term scheming in a much broader way, similar to how we use it in the in-context scheming paper. I would assume that our scheming term is even broader than Joe’s alignment-faking because it also includes taking direct covert action like disabling oversight (which arguably is not alignment-faking).