Jc_Mourrat comments on Charbel-Raphaël and Lucius discuss interpretability

Jc_Mourrat 20 Nov 2023 16:35 UTC
3 points
0
I would say that most of current published interpretability is very, very bad and sucks at its job.
I do have an overall belief that making interpretability that does not suck, is actually quite feasible, and that there’s no particular reason to believe that it’s going to be particularly difficult or take particularly long.
Many people have spent a lot of effort trying to make progress on interpretability. I wish you could find a way to express your opinion that is a little more respectful of their work.