dschwarz comments on Are language models good at making predictions?

dschwarz 6 Nov 2023 14:19 UTC
5 points
3
Great post!
| Manifold markets that were resolved after GPT-4’s current knowledge cutoff of Jan 1, 2022

Were you able to verify that newer knowledge didn’t bleed in? Anecdotally GPT-4 can report various different cutoff dates, depending on the API. And there is anecdotal evidence that GPT-4-0314 occasionally knows about major world events after its training window, presumably from RLHF?

This could explain the better scores on politics than science.
- dynomight 6 Nov 2023 14:54 UTC
  5 points
  0
  Parent
  Sadly, no—we had no way to verify that.
  I guess one way you might try to confirm/refute the idea of data leakage would be to look at the decomposition of brier scores: GPT-4 is much better calibrated for politics vs. science but only very slightly better at politics vs. science in terms of refinement/resolution. Intuitively, I’d expect data leakage to manifest as better refinement/resolution rather than better calibration.