At this point, I think that I personally have enough evidence to be reasonably sure that I understand abstraction well enough that it’s not a conceptual bottleneck. There are still many angles to pursue—I still don’t have efficient abstraction learning algorithms, there’s probably good ways to generalize it, and of course there’s empirical work. I also do not think that other people have enough evidence that they should believe me at this point, when I claim to understand well enough. (In general, if someone makes a claim and backs it up by citing X, then I should assign the claim lower credence than if I stumbled on X organically, because the claimant may have found X via motivated search. This leads to an asymmetry: sometimes I believe a thing, but I do not think that my claim of the thing should be sufficient to convince others, because others do not have visibility into my search process. Also I just haven’t clearly written up every little piece of evidence.)
Anyway, when I consider what barriers are left assuming my current model of abstraction and how it plays with the world are (close enough to) correct, the problems in the OP are the biggest. One of the main qualitative takeaways from the abstraction project is that clean cross-model correspondences probably do exist surprisingly often (a prediction which neural network interpretability work has confirmed to some degree). But that’s an answer to a question I don’t know how to properly set up yet, and the details of the question itself seem important. What criteria do we want these correspondences to satisfy? What criteria does the abstraction picture predict they satisfy in practice? What criteria do they actually satisfy in practice? I don’t know yet.
At this point, I think that I personally have enough evidence to be reasonably sure that I understand abstraction well enough that it’s not a conceptual bottleneck. There are still many angles to pursue—I still don’t have efficient abstraction learning algorithms, there’s probably good ways to generalize it, and of course there’s empirical work. I also do not think that other people have enough evidence that they should believe me at this point, when I claim to understand well enough. (In general, if someone makes a claim and backs it up by citing X, then I should assign the claim lower credence than if I stumbled on X organically, because the claimant may have found X via motivated search. This leads to an asymmetry: sometimes I believe a thing, but I do not think that my claim of the thing should be sufficient to convince others, because others do not have visibility into my search process. Also I just haven’t clearly written up every little piece of evidence.)
Anyway, when I consider what barriers are left assuming my current model of abstraction and how it plays with the world are (close enough to) correct, the problems in the OP are the biggest. One of the main qualitative takeaways from the abstraction project is that clean cross-model correspondences probably do exist surprisingly often (a prediction which neural network interpretability work has confirmed to some degree). But that’s an answer to a question I don’t know how to properly set up yet, and the details of the question itself seem important. What criteria do we want these correspondences to satisfy? What criteria does the abstraction picture predict they satisfy in practice? What criteria do they actually satisfy in practice? I don’t know yet.