First, I want to note that the approach I’m discussing in the post doesn’t necessarily have much to do with (a) or (b); the “philosophy to math to implementation” pipeline may still be primarily concerned with (a*) math results with far-term implications and (b*) practical problems in ML systems which aren’t here yet.
That being said, it is hard to see how a working philosophy-math-implementation pipeline could grow and stay healthy if it focused only on problems which aren’t here yet; we need the pipeline to be in place by the time it is needed. This poses a problem, because if we are trying to avert future problems, we don’t want to get caught in a trap of only doing things which can be justified by dealing with present issues.
Still, over-optimizing for the wrong objective really is a natural generalization of overfitting machine learning models to the data, so it is plausible that quantilizing (or other techniques yet to be invented) provides better results on a wide variety of problems than maximizing. Although my thinking on this is motivated by longer-term considerations, there’s no reason to think this doesn’t show up in existing systems.
First, I want to note that the approach I’m discussing in the post doesn’t necessarily have much to do with (a) or (b); the “philosophy to math to implementation” pipeline may still be primarily concerned with (a*) math results with far-term implications and (b*) practical problems in ML systems which aren’t here yet.
That being said, it is hard to see how a working philosophy-math-implementation pipeline could grow and stay healthy if it focused only on problems which aren’t here yet; we need the pipeline to be in place by the time it is needed. This poses a problem, because if we are trying to avert future problems, we don’t want to get caught in a trap of only doing things which can be justified by dealing with present issues.
Still, over-optimizing for the wrong objective really is a natural generalization of overfitting machine learning models to the data, so it is plausible that quantilizing (or other techniques yet to be invented) provides better results on a wide variety of problems than maximizing. Although my thinking on this is motivated by longer-term considerations, there’s no reason to think this doesn’t show up in existing systems.
Some references for alignment/safety work in this direction: RL with a Corrupted Reward Channel, and Concrete Problems in AI Safety.