Their interpretability results seem to be in section 4.3, and there are quite a number of them from a variety of different networks. I’m not very familiar with interpretability for image nets, but the model spontaneously learning to do high-quality image segmentation was fairly striking, as was it spontaneously doing subsegmentation for portions of objects such as finding a neuron that responded to animal’s heads and another that responded to their legs. That certainly looks like what I hope for if told that an image network was highly interpretable. The largest LLMs they trained were around the size of BERT and GPT-2, so their performance was pretty simple.
Their interpretability results seem to be in section 4.3, and there are quite a number of them from a variety of different networks. I’m not very familiar with interpretability for image nets, but the model spontaneously learning to do high-quality image segmentation was fairly striking, as was it spontaneously doing subsegmentation for portions of objects such as finding a neuron that responded to animal’s heads and another that responded to their legs. That certainly looks like what I hope for if told that an image network was highly interpretable. The largest LLMs they trained were around the size of BERT and GPT-2, so their performance was pretty simple.