Ofer comments on Plausibly, almost every powerful algorithm would be manipulative

Ofer 6 Feb 2020 19:01 UTC
LW: 1 AF: 1
AF

instead, if there are hyperparameters that prevent the error rate going below 0.1, these will be selected by gradient descent as giving a better performance.

I don’t follow this point. If we’re talking about using SGD to update (hyper)parameters, using a batch of images from the currently used datasets, then the gradient update would be determined by the gradient of the loss with respect to that batch of images.
- Stuart_Armstrong 7 Feb 2020 10:46 UTC
  LW: 2 AF: 1
  AF Parent
  To keep it simple, assume the hyperparameters are updated by evolutionary algorithm or some similar search-then-continue-or-stop process.
  - Ofer 7 Feb 2020 12:41 UTC
    LW: 1 AF: 1
    AF Parent
    I want to flag that—in the case of evolutionary algorithms—we should not assume here that the fitness function is defined with respect to just the current batch of images, but rather with respect to, say, all past images so far (since the beginning of the entire training process); otherwise the selection pressure is “myopic” (i.e. models that outperform others on the current batch of images have higher fitness).
    
    (I might be over-pedantic about this topic due to previously being very confused about it.)