More generally, the whole setup is just 1) train your model in a sandbox environment 2) have the weights of your model fixed during deployment.
I think this is missing a crucial point of Counterfactual Oracles (as implemented in ML). I came upon Paul Christiano’s Counterfactual oversight vs. training data today which explains this very well. (What he calls Counterfactual Oversight is basically Counterfactual Oracles as applied to predicting/imitating humans.) The problem with the standard supervised learning system that you’re describing is:
The problem is not stationary, and over time the training data becomes less relevant.
There are spurious correlations in the training data that don’t generalize to the test data.
And Counterfactual Oracles (in ML terms) is actually an attempt to solve these problems:
Counterfactual oversight consists of labelling a random subset of data and using it as online training data. The key difference is that any given data point may become a training data point, with the decision made after the learning system has made a decision about it. As long as the randomization is unpredictable to the learner, this gives us a formal guarantee that there can’t be any noticeable difference between the training and test data. And therefore if our learner behaves well on training data, it really must behave well on test data.
I think this is missing a crucial point of Counterfactual Oracles (as implemented in ML). I came upon Paul Christiano’s Counterfactual oversight vs. training data today which explains this very well. (What he calls Counterfactual Oversight is basically Counterfactual Oracles as applied to predicting/imitating humans.) The problem with the standard supervised learning system that you’re describing is:
And Counterfactual Oracles (in ML terms) is actually an attempt to solve these problems: