Vladimir_Nesov comments on GPT-4

Vladimir_Nesov Mar 14, 2023, 5:45 PM
6 points
2

SAT Math: 700 / 800

Would be interesting to see the transcripts on harder questions altered enough to exclude the possibility of being in the training set.

Edit: That is, it’s the originally composed (rather than remembered) transcripts themselves that I’m interested in seeing, not as a way of verifying the score. Like, with writing a quine the interesting thing is that there is no internal monologue to support the planning of how the code gets written. I wouldn’t be able to solve this problem without a bit of planning and prototyping, even if it takes place in my head, but GPT-4 just writes out the final result. For math, it’s also interesting how much of “showing the work” it takes to solve harder problems. Here’s an example of the kind of thing I mean.
- Carl Feynman Mar 15, 2023, 12:06 AM
  7 points
  0
  Parent
  They tried to detect and prevent questions appearing in the training set being asked as part of the tests. It didn’t seem to make much difference. See table 10, “contamination data for exams”. It’s a pretty tiny fraction of the data, and removing it didn’t make much difference.
- Archimedes Mar 14, 2023, 10:15 PM
  5 points
  0
  Parent
  They address the issue of questions that are in the training data in the paper but you could also look at questions from any SAT that was written after the model was trained.