paulfchristiano comments on [link] New essay summarizing some of my latest thoughts on AI safety

paulfchristiano 10 Nov 2015 5:36 UTC
3 points
I don’t see that (4) should be necessary; I may be misunderstanding it.

If you apply a change of basis to the inputs to a non-linearity, then I’m sure it will destroy performance. If you apply a change of basis to the outputs, then those outputs will cease to look meaningful, but it won’t stop the algorithm from working well. But just because the behavior of the algorithm is robust to applying a particular linear scrambling doesn’t mean that the representation is not natural, or that all of the scrambled representations must be just as natural as the one we started with.
- jsteinhardt 10 Nov 2015 8:29 UTC
  2 points
  Parent
  Yeah I should be a bit more careful on number 4. The point is that many papers which argue that a given NN is learning “natural” representations do so by looking at what an individual hidden unit responds to (as opposed to looking at the space spanned by the hidden layer as a whole). Any such argument seems dubious to me without further support, since it relies on a sort of delicate symmetry-breaking which can only come from either the training procedure or noise in the data, rather than the model itself. But I agree that if such an argument was accompanied by justification of why the training procedure or data noise or some other factor led to the symmetry being broken in a natural way, then I would potentially be happy.
  - paulfchristiano 15 Nov 2015 1:15 UTC
    0 points
    Parent
    
    delicate symmetry-breaking which can only come from either the training procedure or noise in the data, rather than the model itself
    
    I’m still not convinced. The pointwise nonlinearities introduce a preferred basis, and cause the individual hidden units to be much more meaningful than linear combinations thereof.
    - jsteinhardt 15 Nov 2015 7:48 UTC
      0 points
      Parent
      Yeah; I discussed this with some others and came to the same conclusion. I do still think that one should explain why the preferred basis ends up being as meaningful as it does, but agree that this is a much more minor objection.