Even if, as you claim, the models that generalize best aren’t the fastest models, wouldn’t there still be a bias toward speed among the models that generalize well, simply because computation is limited, so faster models can do more with the same amount of computation? It seems to me that the scarcity of compute always results in a speed prior.
Even if, as you claim, the models that generalize best aren’t the fastest models, wouldn’t there still be a bias toward speed among the models that generalize well, simply because computation is limited, so faster models can do more with the same amount of computation? It seems to me that the scarcity of compute always results in a speed prior.