It occurs to me that if the overseer understands everything that the ML model (that it’s training) is doing, and the training is via some kind of local optimization algorithm like gradient descent, the overseer is essentially manually programming the ML model by gradually nudging it from some initial (e.g., random) point in configuration space. If this is a good way to think about what’s happening, we could generalize the idea by letting the overseer use other ways to program the model (for example by using standard programming methodologies adapted to the model class) which can probably be much more efficient than just using the “nudging” method. This suggests that maybe IDA has less to do with ML than it first appears, or maybe that basing IDA on ML only makes sense if ML goes beyond local optimization at some point (so that the analogy with “nudging” breaks down), or we have to figure out how to do IDA safely without the overseer understanding everything that the ML model is doing (which is another another way that the analogy could break down).
It occurs to me that if the overseer understands everything that the ML model (that it’s training) is doing, and the training is via some kind of local optimization algorithm like gradient descent, the overseer is essentially manually programming the ML model by gradually nudging it from some initial (e.g., random) point in configuration space. If this is a good way to think about what’s happening, we could generalize the idea by letting the overseer use other ways to program the model (for example by using standard programming methodologies adapted to the model class) which can probably be much more efficient than just using the “nudging” method. This suggests that maybe IDA has less to do with ML than it first appears, or maybe that basing IDA on ML only makes sense if ML goes beyond local optimization at some point (so that the analogy with “nudging” breaks down), or we have to figure out how to do IDA safely without the overseer understanding everything that the ML model is doing (which is another another way that the analogy could break down).