adamShimi comments on “Inner Alignment Failures” Which Are Actually Outer Alignment Failures

adamShimi 7 Nov 2020 21:43 UTC
LW: 3 AF: 3
AF
You’re absolutely right, but that’s not what I meant by this sentence, nor what Evan thinks.
Here “infinite data” literally means having the data for the training environment and the deployment environment. It means that there is no situation where the system sees some input that was not available during training, because every possible input appears during training. This is obviously impossible to do in practice, but it allow the removal of inductive bias consideration at the theoretical level.
- johnswentworth 8 Nov 2020 0:05 UTC
  LW: 2 AF: 1
  AF Parent
  Here “infinite data” literally means having the data for the training environment and the deployment environment.
  This also doesn’t work—there’s still a degree of freedom in how much of the data is from deployment, and how much from training. Could be 25% training distribution, could be 98% training distribution, and those will produce different optimal strategies. Heck, we could construct it in such a way that there’s infinite data from both but the fraction of data from deployment goes to zero as data size goes to infinity. In that case, the optimal policy in the limit would be exactly what’s optimal on the training distribution.
  When we’re optimizing for averages, it doesn’t just matter whether we’ve ever seen a particular input; it matters how often. The system is going to trade off better performance on more-often-seen data points for worse performance on less-often-seen data points.
  - evhub 8 Nov 2020 5:21 UTC
    LW: 2 AF: 2
    AF Parent
    That’s only true for RL—for SL, perfect loss requires being correct on every data point, regardless of how often it shows up in the distribution. For RL, that’s not true, but for RL we can just say that we’re talking about the optimal policy on the MDP that the model will actually encounter over its existence.
    - johnswentworth 8 Nov 2020 6:45 UTC
      LW: 3 AF: 2
      AF Parent
      for SL, perfect loss requires being correct on every data point, regardless of how often it shows up in the distribution
      This is only true if identical data points always have the same label. To the extent that’s true in real data sets, it’s purely an artifact of finite data, and is almost certainly not true of the underlying process.
      Suppose I feed a system MRI data and labels for whether each patient has cancer, and train the system to predict cancer. MRI images are very high dimensional, so the same image will probably never occur twice in the data set; thus the system can be correct on every datapoint (in training). But if the data were actually infinite, this would fall apart—images would definitely repeat, and they would probably not have the same label every time, because an MRI image does not actually have enough information in it to 100% perfectly predict whether a patient has cancer. Given a particular image, the system will thus have to choose whether this image more often comes from a patient with or without cancer. And if that frequency differs between train and deploy environment, then we have generalization error.
      And this is not just an example of infinities doing weird things which aren’t relevant in practice, because real supervised learners do not learn every possible function. They have inductive biases—some of the learning is effectively unsupervised or baked in by priors. Indeed, in terms of bits of information, the exponentially vast majority of the learning is unsupervised or baked in by priors, and that will always be the case. The argument above thus applies not only to any pair of identical images, but any pair of images which are treated the same by the “unsupervised/inaccessible dimensions” of the learner.
      Taking more of an outside view… if we’re imagining a world in which the “true” label is a deterministic function of the input, and that deterministic function is in the supervised learner’s space, then our model has thrown away everything which makes generalization error a problem in practice. It’s called “robustness to distribution shift” for a reason.
      - evhub 8 Nov 2020 8:53 UTC
        LW: 2 AF: 2
        AF Parent
        
        if we’re imagining a world in which the “true” label is a deterministic function of the input, and that deterministic function is in the supervised learner’s space, then our model has thrown away everything which makes generalization error a problem in practice.
        
        Yes—that’s the point. I’m trying to define outer alignment so I want the definition to get rid of generalization issues.
        johnswentworth 8 Nov 2020 16:53 UTC
        1 point
        AF Parent
        We want a definition which separates out generalization issues, not a definition which only does what we want in situations where generalization issues never happened in the first place.
        If we define outer alignment as (paraphrasing) “good performance assuming we’ve seen every data point once”, then that does not separate out generalization issues. If we’re in a situation where seeing every data point is not sufficient to prevent generalization error (i.e. most situations where generalization error actually occurs), then that definition will classify generalization error as an outer alignment error.
        To put it differently: when choosing definitions of things like “outer alignment”, we do not get to pick what-kind-of-problem we’re facing. The point of the exercise is to pick definitions which carve up the world under a range of architectures, environments, and problems.