Stephen McAleese comments on Retrospective on ‘GPT-4 Predictions’ After the Release of GPT-4

Stephen McAleese 18 Mar 2023 10:50 UTC
2 points
0
At 86.4%, GPT-4′s accuracy is now approaching 100% but GPT-3′s accuracy, which was my prior, was only 43.9%. Obviously one would expect GPT-4′s accuracy to be higher than GPT-3′s since it wouldn’t make sense for OpenAI to release a worse model but it wasn’t clear ex-ante that GPT-4′s accuracy would be near 100%.
I predicted that GPT-4′s accuracy would fall short of 100% accuracy by 20.6% when the true value was 13.6%. Using this approach, the error would be $\frac{20.6 - 13.6}{13.6} = 0.51$
Strictly speaking, the formula for percent error according to Wikipedia is the relative error expressed as a percentage:
$p e r c e n t e r r o r = \frac{v_{t r u e} - v_{a p p r o x}}{v_{t r u e}} \times 100$
I think this is the correct formula to use because what I’m trying to measure is the deviation of the true value from the regression line (predicted value).
Using the formula, the percent error is $\frac{86.4 - 79.4}{86.4} \times 100 = 8.1$
I updated the post to use the term ‘percent error’ with a link to the Wikipedia page and a value of 8.1%.
- Archimedes 18 Mar 2023 18:23 UTC
  6 points
  3
  Parent
  Suppose you predicted 91% but the actual value was 99%. The percent error may only be about 8% but the likelihood of a wrong answer is ¹⁄₁₀₀ instead of your predicted ⁹⁄₁₀₀, which is a huge difference.
  
  You may be interested in the links in this post: https://www.lesswrong.com/posts/6Ltniokkr3qt7bzWw/log-odds-or-logits
  - Stephen McAleese 18 Mar 2023 18:44 UTC
    1 point
    0
    Parent
    In this case, the percent error is 8.1% and the absolute error is 8%. If one student gets 91% on a test and another gets 99% they both get an A so the difference doesn’t seem large to me.
    The article linked seems to be missing. Can you explain your point in more detail?
    - Archimedes 18 Mar 2023 20:51 UTC
      11 points
      3
      Parent
      OK. Let’s make it even more extreme. Suppose you take a commercial flight. The likelihood of dying in a crash is on the order of 1 in 10 million. From a percent error or absolute error perspective, 99.99999% isn’t that different from 99% but that is the difference between one plane crash per year globally and a couple of dozen plane crashes per hour on average. These are wildly different in terms of acceptable safety.
      
      There’s a backup link in the comments: https://www.thejach.com/public/log-probability.pdf