That’s fair. I was thinking of that as part of “compute needed during training”, but you could also split it up into “compute needed for gradient updates” and “compute needed to create data of sufficient quality”, and then say that the stable thing is the “compute needed for gradient updates”.
That’s fair. I was thinking of that as part of “compute needed during training”, but you could also split it up into “compute needed for gradient updates” and “compute needed to create data of sufficient quality”, and then say that the stable thing is the “compute needed for gradient updates”.