| Manifold markets that were resolved after GPT-4’s current knowledge cutoff of Jan 1, 2022
Were you able to verify that newer knowledge didn’t bleed in? Anecdotally GPT-4 can report various different cutoff dates, depending on the API. And there is anecdotal evidence that GPT-4-0314 occasionally knows about major world events after its training window, presumably from RLHF?
This could explain the better scores on politics than science.
I guess one way you might try to confirm/refute the idea of data leakage would be to look at the decomposition of brier scores: GPT-4 is much better calibrated for politics vs. science but only very slightly better at politics vs. science in terms of refinement/resolution. Intuitively, I’d expect data leakage to manifest as better refinement/resolution rather than better calibration.
Great post!
| Manifold markets that were resolved after GPT-4’s current knowledge cutoff of Jan 1, 2022
Were you able to verify that newer knowledge didn’t bleed in? Anecdotally GPT-4 can report various different cutoff dates, depending on the API. And there is anecdotal evidence that GPT-4-0314 occasionally knows about major world events after its training window, presumably from RLHF?
This could explain the better scores on politics than science.
Sadly, no—we had no way to verify that.
I guess one way you might try to confirm/refute the idea of data leakage would be to look at the decomposition of brier scores: GPT-4 is much better calibrated for politics vs. science but only very slightly better at politics vs. science in terms of refinement/resolution. Intuitively, I’d expect data leakage to manifest as better refinement/resolution rather than better calibration.