I want to flag that—in the case of evolutionary algorithms—we should not assume here that the fitness function is defined with respect to just the current batch of images, but rather with respect to, say, all past images so far (since the beginning of the entire training process); otherwise the selection pressure is “myopic” (i.e. models that outperform others on the current batch of images have higher fitness).
(I might be over-pedantic about this topic due to previously being very confused about it.)
To keep it simple, assume the hyperparameters are updated by evolutionary algorithm or some similar search-then-continue-or-stop process.
I want to flag that—in the case of evolutionary algorithms—we should not assume here that the fitness function is defined with respect to just the current batch of images, but rather with respect to, say, all past images so far (since the beginning of the entire training process); otherwise the selection pressure is “myopic” (i.e. models that outperform others on the current batch of images have higher fitness).
(I might be over-pedantic about this topic due to previously being very confused about it.)