Fabien Roger comments on When can we trust model evaluations?