porby comments on porby’s Shortform

porby 6 Feb 2024 22:53 UTC
5 points
1
Has there been any work on the scaling laws of out-of-distribution capability/behavior decay?
A simple example:
1. Simultaneously train task A and task B for N steps.
2. Stop training task B, but continue to evaluate the performance of both A and B.
3. Observe how rapidly task B performance degrades.
Repeat across scale and regularization strategies.
Would be nice to also investigate different task types. For example, tasks with varying degrees of implied overlap in underlying mechanisms (like #2).
I’ve previously done some of these experiments privately, but not with nearly the compute necessary for an interesting result.
The sleeper agents paper reminded me of it. I would love to see what happens on a closer-to-frontier model that’s intentionally backdoored, and then subjected to continued pretraining. Can a backdoor persist for another trillion tokens of nonadversarial-but-extremely-broad training? Does that vary across scale etc?
I’d also like to intentionally find the circumstances that maximize the persistence of out of distribution capabilities not implied by the current training distribution.
Seems like identifying a robust trend here would have pretty important Implications, whichever direction it points.
- Nathan Helm-Burger 8 Feb 2024 0:49 UTC
  4 points
  0
  Parent
  Yeah, I’ve seen work on the sort of thing in your example in the continual learning literature. Also tasks that have like.… 10 components, and train sequentially but test on every task so far trained on. Then you can watch the earlier tasks fall off as training progresses.
- Decaeneus 9 Feb 2024 18:14 UTC
  3 points
  0
  Parent
  For what it’s worth (perhaps nothing) in private experiments I’ve seen that in certain toy (transformer) models, task B performance gets wiped out almost immediately when you stop training on it, in situations where the two tasks are related in some way.
  I haven’t looked at how deep the erasure is, and whether it is far easier to revive than it was to train it in the first place.
  - porby 9 Feb 2024 18:29 UTC
    3 points
    0
    Parent
    Yup, exactly the same experience here.