Thanks!
I would say experiments, introspection and consideration of cases in humans have pretty convincingly established the dissociation between the types of welfare (e.g. see my section on it, although I didn’t go into a lot of detail), but they are highly interrelated and often or even typically build on each other like you suggest.
I’d add that the fact that they sometimes dissociate seems morally important, because it makes it more ambiguous what’s best for someone if multiple types seem to matter, and there are possible beings with some types but not others.
The reason SDG doesn’t overfit large neural networks is probably because of various measures specifically intended to prevent overfitting, like weight penalties, dropout, early stopping, data augmentation + noise on inputs, and large enough learning rates that prevent convergence. If you didn’t do those, running SDG to parameter convergence would probably cause overfitting. Furthermore, we test networks on validation datasets on which they weren’t trained, and throw out the networks that don’t generalize well to the validation set and start over (with new hyperparameters, architectures or parameter initializations). These measures bias us away from producing and especially deploying overfit networks.
Similarly, we might expect scheming without specific measures to prevent it. What could those measures look like? Catching scheming during training (or validation), and either heavily penalizing it, or fully throwing away the network and starting over? We could also validate out-of-training-distribution. Would networks whose caught scheming has been heavily penalized or networks selected for not scheming during training (and validation) generalize to avoid all (or all x-risky) scheming? I don’t know, but it seems more likely than counting arguments would suggest.