The post’s claim that validation-only approaches are fundamentally better than training-with-validation oversimplifies a complex reality. Both approaches modify the distribution of models—neither preserves some “pure” average case. Our base training objective may already have some correlation with our validation signal, and there’s nothing special about maintaining this arbitrary starting point. Sometimes we should increase correlation between training and validation, sometimes decrease it, depending on the specific relationship between our objective and validator. What matters is understanding how correlation affects both P(aligned) and P(pass|misaligned), weighing the tradeoffs, and optimizing within our practical retraining budget (because often, increasing P(aligned|pass) will also decrease P(pass)).
The post’s claim that validation-only approaches are fundamentally better than training-with-validation oversimplifies a complex reality. Both approaches modify the distribution of models—neither preserves some “pure” average case. Our base training objective may already have some correlation with our validation signal, and there’s nothing special about maintaining this arbitrary starting point. Sometimes we should increase correlation between training and validation, sometimes decrease it, depending on the specific relationship between our objective and validator. What matters is understanding how correlation affects both P(aligned) and P(pass|misaligned), weighing the tradeoffs, and optimizing within our practical retraining budget (because often, increasing P(aligned|pass) will also decrease P(pass)).
Flagging for posterity that we had a long discussion about this via another medium and I was not convinced.