I didn’t say they were simply memorizing, it’s more complex than that: would need to look at the parameter scaling compression ratio vs data similarity/repetition, and compare to simpler SOTA compressors. Regardless of whether it’s ‘true’ memorization or not, exposure to downstream task test sets distorts evaluations (this is already a problem for humans where many answers are available on the internet, it’s just much more of a problem for AI that actually digests the entire internet).
I didn’t say they were simply memorizing, it’s more complex than that: would need to look at the parameter scaling compression ratio vs data similarity/repetition, and compare to simpler SOTA compressors. Regardless of whether it’s ‘true’ memorization or not, exposure to downstream task test sets distorts evaluations (this is already a problem for humans where many answers are available on the internet, it’s just much more of a problem for AI that actually digests the entire internet).