Stuart_Armstrong comments on Models, myths, dreams, and Cheshire cat grins

Stuart_Armstrong 25 Jun 2020 12:08 UTC
2 points
Then it has a wrong view of wings and fur (as well as a wrong view of pigs). The more features it has to get right, the harder the adversarial model is to construct—it’s not just moving linearly in a single direction.
- Gurkenglas 25 Jun 2020 21:30 UTC
  2 points
  Parent
  Surely, the adversary convinces it this is a pig by convincing it that it has fur and no wings? I don’t have experience in how it works on the inside, but if the adversary can magically intervene on each neuron, changing its output by d by investing d² effort, then the proper strategy is to intervene on many features a little. Then if there are many layers, the penultimate layer containing such high level concepts as fur or wings would be almost as fooled as the output layer, and indeed I would expect the adversary to have more trouble fooling it on such low-level features as edges and dots.