Again: most of what separates a vision transformer from a language model is the data they’re trained on.
Yeah, but it seems like it can’t be what separates me from human geniuses, because there’s not that much difference in our early sense data.
Yeah, but it seems like it can’t be what separates me from human geniuses, because there’s not that much difference in our early sense data.