Arthur Conmy comments on Science of Deep Learning—a technical agenda

Arthur Conmy 1 Dec 2022 21:32 UTC
2 points
0
To me, the label “Science of DL” is far more broad than interpretability. However, I was claiming that the general goal of Science of DL is not neglected (see my middle paragraph).
- Adam Jermyn 1 Dec 2022 21:40 UTC
  1 point
  0
  Parent
  Got it, I was mostly responding to the third paragraph (insight into why SGD works, which I think is mostly an interpretability question) and should have made that clearer.
  - Arthur Conmy 2 Dec 2022 6:28 UTC
    2 points
    0
    Parent
    I think the situation I’m considering in the quoted part is something like this: research is done on SGD training dynamics and researcher X finds a new way of looking at model component Y, and only certain parts of it are important for performance. So they remove that part, scale the model more, and the model is better. This to me meets the definition of “why SGD works” (the model uses the Y components to achieve low loss).
    I think interpretability that finds ways models represent information (especially across models) is valuable, but this feels different from “why SGD works”.
    - Adam Jermyn 2 Dec 2022 12:49 UTC
      1 point
      0
      Parent
      Got it, I see. I think of the two as really intertwined (e.g. a big part of my agenda at the moment is studying how biases/path-dependence in SGD affect interpretability/polysemanticity).