I’m confused by what you mean that GPT-4o is bad? In my experience it has been stronger than plain GPT-4, especially at more complex stuff. I do physics research and it’s the first model that can actually improve the computational efficiency of parts of my code that implement physical models. It has also become more useful for discussing my research, in the sense that it dives deeper into specialized topics, while the previous GPT-4 would just respond in a very handwavy way.
Man, I wish that was my experience. I feel like I’m constantly asking GPT4o a question, getting a weird or bad response. Then switching to 4 to finish the job.
Benchmarks are consistent with GPT-4o having different strengths than GPT4-Turbo, though at a similar overall level—EQ-Bench is lower, MAGI-Hard is higher, best tested model for Creative Writing according to Claude Opus, but notably worse at judging writing (though still good for its price point).
In my experience different strengths also mean different prompt strategies are necessary; a small highly instruction-focused model might benefit from few-shot repetition and emphasis that just distract a more powerful OpenAI model for example. Which might make universal custom instructions more annoying.
I’m confused by what you mean that GPT-4o is bad? In my experience it has been stronger than plain GPT-4, especially at more complex stuff. I do physics research and it’s the first model that can actually improve the computational efficiency of parts of my code that implement physical models. It has also become more useful for discussing my research, in the sense that it dives deeper into specialized topics, while the previous GPT-4 would just respond in a very handwavy way.
Man, I wish that was my experience. I feel like I’m constantly asking GPT4o a question, getting a weird or bad response. Then switching to 4 to finish the job.
Benchmarks are consistent with GPT-4o having different strengths than GPT4-Turbo, though at a similar overall level—EQ-Bench is lower, MAGI-Hard is higher, best tested model for Creative Writing according to Claude Opus, but notably worse at judging writing (though still good for its price point).
In my experience different strengths also mean different prompt strategies are necessary; a small highly instruction-focused model might benefit from few-shot repetition and emphasis that just distract a more powerful OpenAI model for example. Which might make universal custom instructions more annoying.