Ann comments on keltan’s Shortform

Ann 20 May 2024 16:24 UTC
1 point
0
Benchmarks are consistent with GPT-4o having different strengths than GPT4-Turbo, though at a similar overall level—EQ-Bench is lower, MAGI-Hard is higher, best tested model for Creative Writing according to Claude Opus, but notably worse at judging writing (though still good for its price point).

In my experience different strengths also mean different prompt strategies are necessary; a small highly instruction-focused model might benefit from few-shot repetition and emphasis that just distract a more powerful OpenAI model for example. Which might make universal custom instructions more annoying.