the gears to ascension comments on Jacob Pfau’s Shortform

the gears to ascension 10 Mar 2024 4:03 UTC
4 points
0
I recently asked both claude and gpt4 to estimate their benchmark scores on various benchmarks. if I were trying harder to get a good test I’d probably do it about 10 times and see what the variation is
- Jacob Pfau 10 Mar 2024 18:19 UTC
  3 points
  0
  Parent
  I asked claude opus whether it could clearly parse different tic-tac-toe notations and it just said ‘yes I can’ to all of them, despite having pretty poor performance in most.
  - the gears to ascension 10 Mar 2024 21:48 UTC
    4 points
    0
    Parent
    yeah, its introspection is definitely less than perfect. I’ll DM the prompt I’ve been using so you can see its scores.