For what it’s worth (perhaps nothing) in private experiments I’ve seen that in certain toy (transformer) models, task B performance gets wiped out almost immediately when you stop training on it, in situations where the two tasks are related in some way.
I haven’t looked at how deep the erasure is, and whether it is far easier to revive than it was to train it in the first place.
For what it’s worth (perhaps nothing) in private experiments I’ve seen that in certain toy (transformer) models, task B performance gets wiped out almost immediately when you stop training on it, in situations where the two tasks are related in some way.
I haven’t looked at how deep the erasure is, and whether it is far easier to revive than it was to train it in the first place.
Yup, exactly the same experience here.