Nathan Helm-Burger comments on To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Nathan Helm-Burger 20 Sep 2024 4:49 UTC
4 points
0
This matches my personal observations where I’ve been trying to test various language models on broad philosophical and moral reasoning, but also on math and coding problems.
My rankings are quite different for each.
Math and Coding:
o1-preview > Sonnet 3.5 > Opus 3 > GPT-4o > GPT-4
Philosophy and moral reasoning:
Opus 3 > Sonnet 3.5 > GPT-4o > GPT-4 > o1-preview
I’ve tested other models, but not thoroughly enough to be sure of their ranking. None who would compete for the top two places in either ranking.