So it seems that a lot of people who applied to Understanding Search in Transformers project to do mechanistic interpretability research and probably a lot of them won’t get in. I think there’s a lot of similar projects and potential low-hanging fruit people could work on and we probably could organize to make more teams working on similar things. I’m willing to organize at least one such project myself(specifically working on trying to figure out how algorithm distillation https://arxiv.org/pdf/2210.14215.pdf works) and will talk with Linda about it in 2 weeks and write a longer post with more details but I thought it would be better to write a comment here to see if how many people are interested in that kind of thing beforehand.
So it seems that a lot of people who applied to Understanding Search in Transformers project to do mechanistic interpretability research and probably a lot of them won’t get in.
I think there’s a lot of similar projects and potential low-hanging fruit people could work on and we probably could organize to make more teams working on similar things.
I’m willing to organize at least one such project myself(specifically working on trying to figure out how algorithm distillation https://arxiv.org/pdf/2210.14215.pdf works) and will talk with Linda about it in 2 weeks and write a longer post with more details but I thought it would be better to write a comment here to see if how many people are interested in that kind of thing beforehand.