I’ve been interested in the general question of adapting ‘safety math’ to ML practices in the wild for a while, but as far as I know there isn’t a good repository of (a) math results with clear short term implications or (b) practical problems in current ML systems. Do you have any references for such things? (even just a list of relevant blogs and especially good posts that might be hard to track down otherwise would be very helpful)
First, I want to note that the approach I’m discussing in the post doesn’t necessarily have much to do with (a) or (b); the “philosophy to math to implementation” pipeline may still be primarily concerned with (a*) math results with far-term implications and (b*) practical problems in ML systems which aren’t here yet.
That being said, it is hard to see how a working philosophy-math-implementation pipeline could grow and stay healthy if it focused only on problems which aren’t here yet; we need the pipeline to be in place by the time it is needed. This poses a problem, because if we are trying to avert future problems, we don’t want to get caught in a trap of only doing things which can be justified by dealing with present issues.
Still, over-optimizing for the wrong objective really is a natural generalization of overfitting machine learning models to the data, so it is plausible that quantilizing (or other techniques yet to be invented) provides better results on a wide variety of problems than maximizing. Although my thinking on this is motivated by longer-term considerations, there’s no reason to think this doesn’t show up in existing systems.
I’ve been interested in the general question of adapting ‘safety math’ to ML practices in the wild for a while, but as far as I know there isn’t a good repository of (a) math results with clear short term implications or (b) practical problems in current ML systems. Do you have any references for such things? (even just a list of relevant blogs and especially good posts that might be hard to track down otherwise would be very helpful)
First, I want to note that the approach I’m discussing in the post doesn’t necessarily have much to do with (a) or (b); the “philosophy to math to implementation” pipeline may still be primarily concerned with (a*) math results with far-term implications and (b*) practical problems in ML systems which aren’t here yet.
That being said, it is hard to see how a working philosophy-math-implementation pipeline could grow and stay healthy if it focused only on problems which aren’t here yet; we need the pipeline to be in place by the time it is needed. This poses a problem, because if we are trying to avert future problems, we don’t want to get caught in a trap of only doing things which can be justified by dealing with present issues.
Still, over-optimizing for the wrong objective really is a natural generalization of overfitting machine learning models to the data, so it is plausible that quantilizing (or other techniques yet to be invented) provides better results on a wide variety of problems than maximizing. Although my thinking on this is motivated by longer-term considerations, there’s no reason to think this doesn’t show up in existing systems.
Some references for alignment/safety work in this direction: RL with a Corrupted Reward Channel, and Concrete Problems in AI Safety.