Is the first post the one you meant to link, or did you mean the followup post from Jake? The first post is on toy models of AND and XORs, which I don’t see as being super relevant. But I think Jake’s argument that there’s clear structure that naive hypotheses neglect seems clearly legit
I’m curious if these observations are related at all to the work by Mendel, Hanni and Vaintrob on SAE features, more discussion here.
Is the first post the one you meant to link, or did you mean the followup post from Jake? The first post is on toy models of AND and XORs, which I don’t see as being super relevant. But I think Jake’s argument that there’s clear structure that naive hypotheses neglect seems clearly legit