I agree that if you put more limitations on what heuristics are and how they compose, you end up with a stronger hypothesis. I think it’s probably better to leave that out and try do some more empirical work before making a claim there though (I suppose you could say that the hypothesis isn’t actually making a lot of concrete predictions yet at this stage).
I don’t think (2) necessarily follows, but I do sympathize with your point that the post is perhaps a more specific version of the hypothesis that “we can understand neural network computation by doing mech interp.”
I agree that if you put more limitations on what heuristics are and how they compose, you end up with a stronger hypothesis. I think it’s probably better to leave that out and try do some more empirical work before making a claim there though (I suppose you could say that the hypothesis isn’t actually making a lot of concrete predictions yet at this stage).
I don’t think (2) necessarily follows, but I do sympathize with your point that the post is perhaps a more specific version of the hypothesis that “we can understand neural network computation by doing mech interp.”