Joseph Bloom comments on On Developing a Mathematical Theory of Interpretability

Joseph Bloom 30 Apr 2023 8:14 UTC
1 point
0
Thanks Spencer! I’d love to respond in detail but alas, I lack the time at the moment.

Some quick points:
1. I’m also really excited about SLT work. I’m curious to what degree there’s value in looking at toy models (such as Neel’s grokking work) and exploring them via SLT or to what extent reasoning in SLT might be reinvigorated by integrating experimental ideas/methodology from MI (such as progress measures). It feels plausible to me that there just haven’t been enough people in any of a number of intersections look at stuff and this is a good example. Not sure if you’re planning on going to this: https://www.lesswrong.com/posts/HtxLbGvD7htCybLmZ/singularities-against-the-singularity-announcing-workshop-on but it’s probably not in the cards for me. I’m wondering if promoting it to people with MI experience could be good.
2. I totally get what you’re saying about toy model in sense A or B doesn’t necessarily equate to a toy model being a version of the hard part of the problem. This explanation helped a lot, thank you!
3. I hear what you are saying about next steps being challenging for logistical and coordination issues and because the problem is just really hard! I guess the recourse we have is something like: Look for opportunities/chances that might justify giving something like this more attention or coordination. I’m also wondering if there might be ways of dramatically lowering the bar for doing work in related areas (eg: the same way Neel writing TransformerLens got a lot more people into MI).
  
  Looking forward to more discussions on this in the future, all the best!