I talked about this with Sonnet (after an initial refusal it agreed made no sense in hindsight), and it was able to reproduce a number of other true or near-true facts from the BIG_BENCH documentation, though not photorealistically-memorized text chunks. We figured even if it didn’t train on actual benchmark data, it probably trained on the repository at some point, or references to it.
I talked about this with Sonnet (after an initial refusal it agreed made no sense in hindsight), and it was able to reproduce a number of other true or near-true facts from the BIG_BENCH documentation, though not photorealistically-memorized text chunks. We figured even if it didn’t train on actual benchmark data, it probably trained on the repository at some point, or references to it.