FWIW, I was one of Neel’s MATS 4.1 scholars and I would classify 3⁄4 of Neel’sscholar’soutputs as reverse engineering some component of LLMs (for completeness, this is the other one, which doesn’t nicely fit as ‘reverse engineering’ imo). I would also say that this is still an active direction of research (lots of ground to cover with MLP neurons, polysemantic heads, and more)
FWIW, I was one of Neel’s MATS 4.1 scholars and I would classify 3⁄4 of Neel’s scholar’s outputs as reverse engineering some component of LLMs (for completeness, this is the other one, which doesn’t nicely fit as ‘reverse engineering’ imo). I would also say that this is still an active direction of research (lots of ground to cover with MLP neurons, polysemantic heads, and more)
You’re clearly right, thanks