sudhanshu_kasewa comments on Constructing Goodhart

sudhanshu_kasewa Feb 14, 2019, 12:06 PM
2 points
The problem is to come up with some model system where optimizing for something which is almost-but-not-quite the thing you really want produces worse results than not optimizing at all.
I’m confused; maybe the following query is addressed elsewhere but I have yet to come across it:
Doesn’t the (standard, statistical machine learning 101) formulation of minimising-training-error-when-we-actually-care-about-minimising-test-error fall squarely in the camp of something that demonstrates Goodhart’s law? Aggressively optimising to reduce training error with a function-approximator with parameters >> number of data points (e.g. today’s deep neural networks) will result in ~0.0 training error, but it would most likely totally bomb on unseen data. This, to me, appears to be as straightforward an example of a Goodhart’s Law as is necessary to illustrate the concept, and serves as a segue into how to mitigate this phenomenon of overfitting, e.g. by validation, regularisation, enforcing sparsity, and so on.
Given the premise that we are likely to start from something close to pareto-optimal to begin with, we now have a system which works well from the get-go, and without suitable controls optimising on reducing training error to the exclusion of all other metrics will almost certainly be worse than not optimising at all.
- johnswentworth Feb 14, 2019, 6:37 PM
  2 points
  Parent
  The problem is that you invoke the idea that it’s starting from something close to pareto-optimal. But pareto optimal with respect to what? Pareto optimality implies a multi-objective problem, and it’s not clear what those objectives are. That’s why we need the whole causality framework: the multiple objectives are internal nodes of the DAG.
  The standard description of overfitting does fit into the DAG model, but most of the usual solutions to that problem are specific to overfitting; they don’t generalize to Goodhart problems in e.g. management.
  - sudhanshu_kasewa Feb 14, 2019, 10:34 PM
    1 point
    Parent
    I assumed (close to) pareto-optimality, since the OP suggests that most real systems are starting from this state.
    
    The (immediately disceranable) competing objectives here are training error and test error. Only one can be observed ahead of deployment (much like X in the X+Y example earlier), while it’s actually the other which matters. That is not to say that there aren’t other currently undiscovered / unstated metrics of interest (training time, designer time, model size, etc.) which may be articulated and optimised for, still leading to a suboptimal result on test error. Indeed, we can imagine a situation with a perfectly good predictive neural network, which for some reason won’t run on the new hardware that’s provisioned, and so a hasty, over-extended engineer might simply delete entire blocks of it, optimising their time and the ability of the model to fit on a Raspberry PI, while most likely completely voiding any ability of the network to perform the task meaningfully.
    
    If this sounds contrived, forgive me. Perhaps I’m am talking tangentially past the discussion at hand; if so, kindly ignore. Mostly, I only wish to propose that a fundamental formulation of ML, of minimising training loss while we want to reduce test loss, is an example of Goodhart’s law in action, and there is rich literature on techniques to circumvent its effects. Do you agree? Why / why not?