Suppose Training Run Z is a finetune of Model Y, and Model Y was the output of Training Run Y, which was already a finetune of Foundation Model X produced by Training Run X (all of which happened after September 2021). This is saying that not only Training Run Y (i.e. the compute used to produce one of the inputs to Training Run Z), but also Training Run X (a “recursive” or “transitive” dependency), count additively against the size limit for Training Run Z.
What does this mean?
Suppose Training Run Z is a finetune of Model Y, and Model Y was the output of Training Run Y, which was already a finetune of Foundation Model X produced by Training Run X (all of which happened after September 2021). This is saying that not only Training Run Y (i.e. the compute used to produce one of the inputs to Training Run Z), but also Training Run X (a “recursive” or “transitive” dependency), count additively against the size limit for Training Run Z.