hazel comments on GPT-4

hazel 15 Mar 2023 10:37 UTC
0 points
−2
Green bars are GPT-4. Blue bars are not. I suspect they just didn’t retest everything.
- peterbarnett 15 Mar 2023 16:12 UTC
  16 points
  4
  Parent
  They did run the tests for all models, from Table 1:
  (the columns are GPT-4, GPT-4 (no vision), GPT-3.5)
- Writer 15 Mar 2023 10:49 UTC
  4 points
  0
  Parent
  It would be weird to include them if they didn’t run those tests. My read was that the green bars are the same height as the blue bars, so they are hidden behind.
  - hazel 15 Mar 2023 11:01 UTC
    0 points
    1
    Parent
    Meaning it literally showed zero difference in half the tests? Does that make sense?
    - sanxiyn 15 Mar 2023 17:43 UTC
      3 points
      −1
      Parent
      AP exams are scored on a scale of 1 to 5, so yes, getting the exact same score with zero difference makes sense.
    - Writer 15 Mar 2023 11:05 UTC
      2 points
      0
      Parent
      Roughly ¹⁄₃ of the tests but yeah, that’s why I’m confused. Looks weird enough.