Was going to make nearly the same comment, so i’ll just add to yours: an existing training run can benefit from hardware/software upgrades nearly as much as new training runs. Big changes to hardware&software are slow relative to these timescales. (Nvidia releases new GPU architectures on a two year cadence, but they are mostly incremental).
New training runs benefit most from major architectural changes and especially training/data/curriculum changes.
Was going to make nearly the same comment, so i’ll just add to yours: an existing training run can benefit from hardware/software upgrades nearly as much as new training runs. Big changes to hardware&software are slow relative to these timescales. (Nvidia releases new GPU architectures on a two year cadence, but they are mostly incremental).
New training runs benefit most from major architectural changes and especially training/data/curriculum changes.