This matches my personal observations where I’ve been trying to test various language models on broad philosophical and moral reasoning, but also on math and coding problems.
My rankings are quite different for each.
Math and Coding:
o1-preview > Sonnet 3.5 > Opus 3 > GPT-4o > GPT-4
Philosophy and moral reasoning:
Opus 3 > Sonnet 3.5 > GPT-4o > GPT-4 > o1-preview
I’ve tested other models, but not thoroughly enough to be sure of their ranking. None who would compete for the top two places in either ranking.
This matches my personal observations where I’ve been trying to test various language models on broad philosophical and moral reasoning, but also on math and coding problems.
My rankings are quite different for each.
Math and Coding:
o1-preview > Sonnet 3.5 > Opus 3 > GPT-4o > GPT-4
Philosophy and moral reasoning:
Opus 3 > Sonnet 3.5 > GPT-4o > GPT-4 > o1-preview
I’ve tested other models, but not thoroughly enough to be sure of their ranking. None who would compete for the top two places in either ranking.