Perhaps the most important takeaway from our study is hidden in plain sight: the field is in danger of being drowned by noise. Different optimizers exhibit a surprisingly similar performance distribution compared to a single method that is re-tuned or simply re-run with different random seeds. It is thus questionable how much insight the development of new methods yields, at least if they are conceptually and functionally close to the existing population.
This is from the author’s conclusion. They do also acknowledge that a couple optimizers seem to be better than others across tasks and datasets, and I agree with them (and with you if that’s your point). But most optimizers do not meet the “significant improvement” claims their authors have been making. They also say most tuned algorithms can be equaled by trying seevral un-tuned algorithms. So the point is twofold :
1. Most new algorithms can be equaled or beaten by re-tuning of most old algorithms. 2. Their tuned versions can be equaled or beaten by many un-tuned versions of old algorithms.
This seems to be consistent with there being no overwhelming winner and low variance in algorithm performance.
If I understand your model correctly, and let me know if I do, if an algorithm Y improves performance by 1 std on a specific task, it woulds still get beaten by an unimproved algorithm 16% of the time. Sure, but you have to compute the probability of the Y algorithm (mean=1, std=1) being beaten by the X1, X2, X3, X4 algorithms (all mean=0, std=1) , which is what is happening in the authors’ experiment, and it is much lower.
This is from the author’s conclusion. They do also acknowledge that a couple optimizers seem to be better than others across tasks and datasets, and I agree with them (and with you if that’s your point). But most optimizers do not meet the “significant improvement” claims their authors have been making. They also say most tuned algorithms can be equaled by trying seevral un-tuned algorithms. So the point is twofold :
1. Most new algorithms can be equaled or beaten by re-tuning of most old algorithms.
2. Their tuned versions can be equaled or beaten by many un-tuned versions of old algorithms.
This seems to be consistent with there being no overwhelming winner and low variance in algorithm performance.
If I understand your model correctly, and let me know if I do, if an algorithm Y improves performance by 1 std on a specific task, it woulds still get beaten by an unimproved algorithm 16% of the time. Sure, but you have to compute the probability of the Y algorithm (mean=1, std=1) being beaten by the X1, X2, X3, X4 algorithms (all mean=0, std=1) , which is what is happening in the authors’ experiment, and it is much lower.