FWIW, I woudn’t describe ARC’s work as interpretability. My best understanding is that they aren’t directly targeting better human understanding of how AIs work (though this may happen indirectly). (I’m pretty confident in this, but maybe someone from ARC will correct me : ) )
Edit: it seems like you define interpretability very broadly, to the point where I’m a bit confused about what is or isn’t interpretability work. This comment should be interpreted to refer to interpretability as ‘someone (humans or AIs) getting a better understanding how an AI works (often with a mechanistic connotation)’
FWIW, I woudn’t describe ARC’s work as interpretability. My best understanding is that they aren’t directly targeting better human understanding of how AIs work (though this may happen indirectly). (I’m pretty confident in this, but maybe someone from ARC will correct me : ) )
Edit: it seems like you define interpretability very broadly, to the point where I’m a bit confused about what is or isn’t interpretability work. This comment should be interpreted to refer to interpretability as ‘someone (humans or AIs) getting a better understanding how an AI works (often with a mechanistic connotation)’
Thanks! I discuss in the second post of the sequence why I lump ARC’s work in with human-centered interpretability.