gwern comments on Distinguishing test from training

gwern 30 Nov 2022 1:04 UTC
LW: 21 AF: 9
7
AF
An additional one: “reality is the first place the AI is deployed in narrow tool-like ways and trained on narrow specialized datasets which could not elicit the capabilities the AI started off with”.

At least in the current paradigm, it looks like generalist models/archs will precede hyperspecialized trained-from-scratch models/archs (the latter of which can only be developed given the former). So there will be an inherent, massive, train-test distribution shift across many, if not most, model deployments—especially early on, in the first deployments (which will be the most dangerous). ‘Specialization’ here can happen in a wide variety of ways, ranging from always using a specific prompt to finetuning on a dataset to knowledge-distillation to a cheaper model etc. (Or to put it more concretely: everyone uses GPT-3 on much less diverse data than it was originally trained on—raw Internet-wide scrapes—and few to no people use it on more diverse datasets than the original training data, if only because where would you even get such a thing?)

And this can’t be solved by any hacks or safety measures because it defeats the point of deployment: to be practically useful, we need models to be hyperspecialized, and then stable static blackboxes which play their assigned role in whatever system has been designed using their specific capability as a puzzle piece, and perform only the designated tasks, and aren’t further training on random Internet scrapes or arbitrary tasks. Retaining the flexibility and even doing actual training massively complicates development and deployment and may cost several orders of magnitude more than the obvious easy thing of eg. switching from an OA API call to a local finetuned GPT-J.

(And of course note the implications of that: real data will be highly autocorrelated because you want to process it as it arrives to get an answer now, not wait a random multi-decade interval to fake a large batch of i.i.d. data which would produce the same batchnorm or other runtime global state; inputs will have very different timings & latencies depending on where the model is being run and may evolve timing attacks; inputs will be tailored to a specific user rather than every hypothetical user...)
- lbThingrb 6 Dec 2022 2:49 UTC
  2 points
  1
  Parent
  That HN comment you linked to is almost 10 years old, near the bottom of a thread on an unrelated story, and while it supports your point, I don’t notice what other qualities it has that would make it especially memorable, so I’m kind of amazed that you surfaced it at an appropriate moment from such an obscure place and I’m curious how that happened.
  - gwern 6 Dec 2022 22:43 UTC
    12 points
    0
    Parent
    Oh, it’s just from my list of reward hacking; I don’t mention the others because most of them aren’t applicable to train/deploy distinction. And I remember it because I remember a lot of things and this one was particularly interesting to me for exactly the reason I linked it just now—illustrating that optimization processes can hack train/deploy distinctions as a particularly extreme form of ‘data leakage’. As for where I got it, I believe someone sent it to me way back when I was compiling that list.