Is there a difference between training competitiveness and performance competitiveness? My impression is that, for all of these proposals, however much resources you’ve already put into training, putting more resources into training will continue to improve performance. If this is the case, then whether a factor influencing competitiveness is framed as affecting the cost of training or as affecting the performance of the final product, either way it’s just affecting the efficiency with which putting resources towards training leads to good performance. Separating competitiveness into training and performance competitiveness would make sense if there’s a fixed amount of training that must be done to achieve any reasonable performance at all, but past that, more training is not effective at producing better performance. My impression is that this isn’t usually what happens.
My impression is that, for all of these proposals, however much resources you’ve already put into training, putting more resources into training will continue to improve performance.
I think this is incorrect. Most training setups eventually flatline, or close to it (e.g. see AlphaZero’s ELO curve), and need algorithmic or other improvements to do better.
For individual ML models, sure, but not for classes of similar models. E.g. GPT-3 presumably was more expensive to train than GPT-2 as part of the cost to getting better results. For each of the proposals in the OP, training costs constrain how complex a model you can train, which in turn would affect performance.
I believe the main difference is that training is a one-time cost. Thus lacking training competitiveness is less an issue than lacking performance competitiveness, as the latter is a recurrent cost.
But if you could always get arbitrarily high performance with long enough training, then claiming “the performance isn’t high enough” is equivalent to saying “we haven’t trained long enough”. So it reduces to just one dimension of competitiveness, which is how steep the curve of improvement over time is on average.
For the actual reason I think it makes sense to separate these, see my other comment: you can’t usually get arbitrarily high performance by training longer.
Is there a difference between training competitiveness and performance competitiveness? My impression is that, for all of these proposals, however much resources you’ve already put into training, putting more resources into training will continue to improve performance. If this is the case, then whether a factor influencing competitiveness is framed as affecting the cost of training or as affecting the performance of the final product, either way it’s just affecting the efficiency with which putting resources towards training leads to good performance. Separating competitiveness into training and performance competitiveness would make sense if there’s a fixed amount of training that must be done to achieve any reasonable performance at all, but past that, more training is not effective at producing better performance. My impression is that this isn’t usually what happens.
I think this is incorrect. Most training setups eventually flatline, or close to it (e.g. see AlphaZero’s ELO curve), and need algorithmic or other improvements to do better.
For individual ML models, sure, but not for classes of similar models. E.g. GPT-3 presumably was more expensive to train than GPT-2 as part of the cost to getting better results. For each of the proposals in the OP, training costs constrain how complex a model you can train, which in turn would affect performance.
I believe the main difference is that training is a one-time cost. Thus lacking training competitiveness is less an issue than lacking performance competitiveness, as the latter is a recurrent cost.
But if you could always get arbitrarily high performance with long enough training, then claiming “the performance isn’t high enough” is equivalent to saying “we haven’t trained long enough”. So it reduces to just one dimension of competitiveness, which is how steep the curve of improvement over time is on average.
For the actual reason I think it makes sense to separate these, see my other comment: you can’t usually get arbitrarily high performance by training longer.