Pattern comments on The Inner Alignment Problem

Pattern 4 Jun 2019 20:29 UTC
2 points
Subprocess interdependence. There are some reasons to believe that there might be more initial optimization pressure towards proxy aligned than robustly aligned mesa-optimizers. In a local optimization process, each parameter of the learned algorithm (e.g. the parameter vector of a neuron) is adjusted to locally improve the base objective conditional on the other parameters. Thus, the benefit for the base optimizer of developing a new subprocess will likely depend on what other subprocesses the learned algorithm currently implements. Therefore, even if some subprocess would be very beneficial if combined with many other subprocesses, the base optimizer may not select for it until the subprocesses it depends on are sufficiently developed. As a result, a local optimization process would likely result in subprocesses that have fewer dependencies being developed before those with more dependencies.
On the one hand, this makes it sound like, instead of creating new (neurons? sets of neurons?) existing neurons are likely to be re-used. Whereas One pixel attack for fooling deep neural networks, almost seems to ask “are subprocesses with lots of dependencies* ever made?”
*High(er) level processes.