More generally, it’s worth thinking about the conscious reasoning gap—this gap happens to be smaller in vision for various reasons.
This gap will also ofc exist in language models trying to interpret themselves, but fine-tuning might be very helpful for at least partially removing this gap.
More generally, it’s worth thinking about the conscious reasoning gap—this gap happens to be smaller in vision for various reasons.
This gap will also ofc exist in language models trying to interpret themselves, but fine-tuning might be very helpful for at least partially removing this gap.
isn’t this about generation vs classification, not language vs vision?