johnswentworth comments on Emergent modularity and safety

johnswentworth 21 Oct 2021 2:26 UTC
LW: 25 AF: 11
0
AF
Our default expectation about large neural networks should be that we will understand them in roughly the same ways that we understand biological brains, except where we have specific reasons to think otherwise.
Why would that be our default expectation? We don’t have direct access to all of the underlying parameters in the brain. We can’t even simulate it yet, let alone take a gradient.
- Richard_Ngo 21 Oct 2021 18:19 UTC
  LW: 6 AF: 5
  0
  AF Parent
  Why would that be our default expectation?
  Lots of reasons. Neural networks are modelled after brains. They both form distributed representations at very large scales, they both learn over time, etc etc. Sure, you’ve pointed out a few differences, but the similarities are so great that this should be the main anchor for our expectations (rather than, say, thinking that we’ll understand NNs the same way we understand support vector machines, or the same way we understand tree search algorithms, or...).
  - TurnTrout 21 Oct 2021 20:18 UTC
    LW: 10 AF: 7
    0
    AF Parent
    I’m not convinced that these similarities are great enough to merit such anchoring. Just because NNs have more in common with brains than with SVMs, does not imply that we will understand NNs in roughly the same ways that we understand biological brains. We could understand them in a different set of ways than we understand biological brains, and differently than we understand SVMs.
    Rather than arguing over reference class, it seems like it would make more sense to note the specific ways in which NNs are similar to brains, and what hints those specific similarities provide.
  - johnswentworth 21 Oct 2021 21:37 UTC
    LW: 3 AF: 3
    0
    AF Parent
    Perhaps a good way to summarize all this is something like “qualitatively similar models probably work well for brains and neural networks”. I agree to a large extent with that claim (though there was a time when I would have agreed much less), and I think that’s the main thing you need for the rest of the post.
    “Ways we understand” comes across as more general than that—e.g. we understand via experimentally probing physical neurons vs spectral clustering of a derivative matrix.
- Daniel_Eth 21 Oct 2021 17:28 UTC
  LW: 3 AF: 3
  0
  AF Parent
  The statement seems almost tautological – couldn’t we somewhat similarly claim that we’ll understand NNs in roughly the same ways that we understand houses, except where we have reasons to think otherwise? The “except where we have reasons to think otherwise” bit seems to be doing a lot of work.
  - Richard_Ngo 21 Oct 2021 18:15 UTC
    LW: 2 AF: 2
    0
    AF Parent
    Compare: when trying to predict events, you should use their base rate except when you have specific updates to it.
    Similarly, I claim, our beliefs about brains should be the main reference for our beliefs about neural networks, which we can then update from.
    I agree that the phrasing could be better; any suggestions?
    - johnswentworth 21 Oct 2021 21:24 UTC
      LW: 6 AF: 4
      0
      AF Parent
      I agree that the phrasing could be better; any suggestions?
      I actually think you could just drop that intro altogether, or move it later into the post. We do have pretty good evidence of modularity in the brain (as well as other biological systems) and in trained neural nets; it seems to be a pretty common property of large systems “evolved” by local optimization. And the rest of the post (as well as some of the other comments) does a good job of talking about some of that evidence. It’s a good post, and I think the arguments later in the post are stronger than that opening.
      (On the other hand, if you’re opening with it because that was your own main prior, then that makes sense. In that case, maybe note that it was a prior for you, but that the evidence from other directions is strong enough that we don’t need to rely much on that prior?)
      - Richard_Ngo 21 Oct 2021 22:53 UTC
        LW: 4 AF: 4
        0
        AF Parent
        Thanks, that’s helpful. I do think there’s a weak version of this which is an important background assumption for the post (e.g. without that assumption I’d need to explain the specific ways in which ANNs and BNNs are similar), so I’ve now edited the opening lines to convey that weak version instead. (I still believe the original version but agree that it’s not worth defending here.)
    - Daniel_Eth 21 Oct 2021 19:10 UTC
      LW: 1 AF: 1
      0
      AF Parent
      Yeah, I’m not trying to say that the point is invalid, just that phrasing may give the point more appeal than is warranted from being somewhat in the direction of a deepity. Hmm, I’m not sure what better phrasing would be.