GPT-4o… like Sora not that long ago, is a major triumph of the Sutskeverian vision of just scaling up sequence prediction for everything, and which OA has been researching for years
Could you elaborate as to why you see GPT-4o as continuous with the scaling strategy? My understanding is this is a significantly smaller model than 4, designed to reduce latency and cost, which is then “compensated for” with multimodality and presumably many other improvements in architecture/data/etc.
Isn’t GPT-4o a clear break (temporary, I assume) with the singular focus on scaling of the past few years?
Could you elaborate as to why you see GPT-4o as continuous with the scaling strategy? My understanding is this is a significantly smaller model than 4, designed to reduce latency and cost, which is then “compensated for” with multimodality and presumably many other improvements in architecture/data/etc.
Isn’t GPT-4o a clear break (temporary, I assume) with the singular focus on scaling of the past few years?