Thomas Kwa comments on Bias towards simple functions; application to alignment?

Thomas Kwa 18 Aug 2022 17:37 UTC
2 points
0
The upper bound for Kolmogorov complexity of an input-output map in Dingle is not very interesting; iirc it’s basically saying that a program can be constructed of the form “specify this exact input-output map using the definition already provided”, and the upper bound is just the length of this program.
Also one concern is it’s not clear thinking about this differentially advances alignment over capabilities.
- evhub 18 Aug 2022 19:21 UTC
  5 points
  3
  Parent
  I think it very clearly advantages alignment over capabilities—understanding SGD’s inductive biases is one of the primary bottlenecks for inner alignment imo.
  
  The stuff linked here is pretty old, though, e.g. this stuff predates Mingard et al..
  - DavidHolmes 19 Aug 2022 8:19 UTC
    3 points
    0
    Parent
    Thanks very much for the link!
  - DavidHolmes 19 Aug 2022 14:14 UTC
    1 point
    0
    Parent
    P.s. the main thing I have taken so far from the link you posted is that the important part is not exactly about the biases of SGD. Rather, it is about the structure of the DNN itself; the algorithm used to find a (local) optimum plays less of a role than the overall structure. But probably I’m reading too much into your precise phrasing.
- DavidHolmes 19 Aug 2022 14:11 UTC
  1 point
  0
  Parent
  Hi Thomas, I agree the proof of the bound is not so interesting. What I found more interesting were the examples and discussion suggesting that, in practise, the upper bound seems often to be somewhat tight.
  
  Concerning differential advancement, I agree this can advance capabilities, but I suspect that advancing alignment is somewhat hopeless unless we can understand better what is going on inside DNNs. On that basis I think it does differentials advance alignment, but of course other people may disagree.