At 86.4%, GPT-4′s accuracy is now approaching 100% but GPT-3′s accuracy, which was my prior, was only 43.9%. Obviously one would expect GPT-4′s accuracy to be higher than GPT-3′s since it wouldn’t make sense for OpenAI to release a worse model but it wasn’t clear ex-ante that GPT-4′s accuracy would be near 100%.
I predicted that GPT-4′s accuracy would fall short of 100% accuracy by 20.6% when the true value was 13.6%. Using this approach, the error would be 20.6−13.613.6=0.51
Strictly speaking, the formula for percent error according to Wikipedia is the relative error expressed as a percentage:
percenterror=vtrue−vapproxvtrue×100
I think this is the correct formula to use because what I’m trying to measure is the deviation of the true value from the regression line (predicted value).
Using the formula, the percent error is 86.4−79.486.4×100=8.1
I updated the post to use the term ‘percent error’ with a link to the Wikipedia page and a value of 8.1%.
Suppose you predicted 91% but the actual value was 99%. The percent error may only be about 8% but the likelihood of a wrong answer is 1⁄100 instead of your predicted 9⁄100, which is a huge difference.
In this case, the percent error is 8.1% and the absolute error is 8%. If one student gets 91% on a test and another gets 99% they both get an A so the difference doesn’t seem large to me.
The article linked seems to be missing. Can you explain your point in more detail?
OK. Let’s make it even more extreme. Suppose you take a commercial flight. The likelihood of dying in a crash is on the order of 1 in 10 million. From a percent error or absolute error perspective, 99.99999% isn’t that different from 99% but that is the difference between one plane crash per year globally and a couple of dozen plane crashes per hour on average. These are wildly different in terms of acceptable safety.
At 86.4%, GPT-4′s accuracy is now approaching 100% but GPT-3′s accuracy, which was my prior, was only 43.9%. Obviously one would expect GPT-4′s accuracy to be higher than GPT-3′s since it wouldn’t make sense for OpenAI to release a worse model but it wasn’t clear ex-ante that GPT-4′s accuracy would be near 100%.
I predicted that GPT-4′s accuracy would fall short of 100% accuracy by 20.6% when the true value was 13.6%. Using this approach, the error would be 20.6−13.613.6=0.51
Strictly speaking, the formula for percent error according to Wikipedia is the relative error expressed as a percentage:
percent error=vtrue−vapproxvtrue×100
I think this is the correct formula to use because what I’m trying to measure is the deviation of the true value from the regression line (predicted value).
Using the formula, the percent error is 86.4−79.486.4×100=8.1
I updated the post to use the term ‘percent error’ with a link to the Wikipedia page and a value of 8.1%.
Suppose you predicted 91% but the actual value was 99%. The percent error may only be about 8% but the likelihood of a wrong answer is 1⁄100 instead of your predicted 9⁄100, which is a huge difference.
You may be interested in the links in this post: https://www.lesswrong.com/posts/6Ltniokkr3qt7bzWw/log-odds-or-logits
In this case, the percent error is 8.1% and the absolute error is 8%. If one student gets 91% on a test and another gets 99% they both get an A so the difference doesn’t seem large to me.
The article linked seems to be missing. Can you explain your point in more detail?
OK. Let’s make it even more extreme. Suppose you take a commercial flight. The likelihood of dying in a crash is on the order of 1 in 10 million. From a percent error or absolute error perspective, 99.99999% isn’t that different from 99% but that is the difference between one plane crash per year globally and a couple of dozen plane crashes per hour on average. These are wildly different in terms of acceptable safety.
There’s a backup link in the comments: https://www.thejach.com/public/log-probability.pdf