I agree—the point is that if you train on addition examples without any modular wraparound (whether you think of that as regular addition or modular addition with a large prime, doesn’t super matter), then there is at least some evidence that you get a different representation than the one Nanda et al found.
I agree—the point is that if you train on addition examples without any modular wraparound (whether you think of that as regular addition or modular addition with a large prime, doesn’t super matter), then there is at least some evidence that you get a different representation than the one Nanda et al found.