I’m the co-founder and CEO of Apollo Research: https://www.apolloresearch.ai/
My goal is to improve our understanding of scheming and build tools and methods to detect and mitigate it.
I previously did a Ph.D. in ML at the International Max-Planck research school in Tübingen, worked part-time with Epoch and did independent AI safety research.
For more see https://www.mariushobbhahn.com/aboutme/
I subscribe to Crocker’s Rules
(thx to Bronson for privately pointing this out)
I think directionally, removing parts of the training data would probably make a difference. But potentially less than we might naively assume, e.g. see Evan’s argument on the AXRP podcast.
Also, I think you’re right, and my statement of “I think for most practical considerations, it makes almost zero difference.” was too strong.