Rafael Harth comments on o3

Rafael Harth 21 Dec 2024 19:30 UTC
4 points
0
You could call them logic puzzles. I do think most smart people on LW would get ¹⁰⁄₁₀ without too many problems, if they had enough time, although I’ve never tested this.
- Noosphere89 21 Dec 2024 19:36 UTC
  2 points
  0
  Parent
  Assuming they are verifiable or have an easy way to verify whether or not a solution does work, I expect o3 to at least get ²⁄₁₀, if not ³⁄₁₀ correct under high-compute settings.