In addition to /u/abramdemski’s point above, this assumes we have a known correct model for the system, and no adversarial pressure, which means it only applies to what /u/ScottGarrabrant referred to as “regressional Goodhart.” See our paper here; https://arxiv.org/abs/1803.04585
But, again agreeing with Abram, I think this is a really good direction to pursue!
In addition to /u/abramdemski’s point above, this assumes we have a known correct model for the system, and no adversarial pressure, which means it only applies to what /u/ScottGarrabrant referred to as “regressional Goodhart.” See our paper here; https://arxiv.org/abs/1803.04585
But, again agreeing with Abram, I think this is a really good direction to pursue!