You’re right that’s an assumption! I think how valid it is also depends a bit on how you train the model… is it assembled from pre-trained pieces or do you train the whole thing from scratch? The former seems like it’d have an easier time being suddenly deceptive than the latter.
Training the Surgeon in lockstep, or with compute advantages, is a good idea.
You’re right that’s an assumption! I think how valid it is also depends a bit on how you train the model… is it assembled from pre-trained pieces or do you train the whole thing from scratch? The former seems like it’d have an easier time being suddenly deceptive than the latter.
Training the Surgeon in lockstep, or with compute advantages, is a good idea.