jacob_cannell comments on The basic reasons I expect AGI ruin

jacob_cannell 18 Apr 2023 19:49 UTC
14 points
6
SGD has a strong inherent simplicity bias, even without weight regularization, and this is fairly well known in DL literature (I could probably find hundreds of examples if I had the time—I do not). By SGD I specifically mean SGD variants that don’t use a 2nd order approx (such as Adam). The are many papers which find approx 2nd-order variance adjusted optimizers like Adam have various generalization/overfitting issues compared to SGD, this comes up over and over, such that it’s fairly common to use some additional regularization with Adam.

It’s also pretty intuitively obvious why SGD has a strong simplicity prior if you just think through some simple examples—as SGD doesn’t move in the direction that minimizes loss, it moves in the parsimonious direction which minimizes loss per unit weight distance (moved away from the init). 2nd order optimizers like adam can move more directly in the direction of lower loss.
- Joar Skalse 27 Apr 2023 18:05 UTC
  3 points
  2
  Parent
  Empirically, the inductive bias that you get when you train with SGD, and similar optimisers, is in fact quite similar to the inductive bias that you would get, if you were to repeatedly re-initialise a neural network until you randomly get a set of weights that yield a low loss. Which optimiser you use does have an effect as well, but this is very small by comparison. See this paper.
- Daniel Kokotajlo 19 Apr 2023 14:41 UTC
  3 points
  0
  Parent
  Yes. (Note that “randomly sample from the set of all low loss NN parameter configurations” goes hand in hand with there being a bias towards simplicity, it’s not a contradiction. Is that maybe what’s going on here—people misinterpreted Bensinger as somehow not realizing simpler configurations are more likely?)
  - jacob_cannell 19 Apr 2023 16:29 UTC
    17 points
    12
    Parent
    My prior is that DL has a great amount of wierd domain knowledge which is mysterious to those who haven’t spent years studying it, and years studying DL correlates with strong disagreement with the sequences/MIRI positions in many fundamentals. I trace all this back to EY over-updating too much on ev psych and not reading enough neuroscience and early DL.
    
    So anyway, a sentence like “randomly sample from the set of all low loss NN parameter configurations” is not one I would use or expect a DL-insider to use and sounds more like something a MIRI/LW person would say—in part yes because I don’t generally expect MIRI/LW folks to be especially aware of the intrinsic SGD simplicity prior. The more correct statement is “randomly sample from the set of all simple low loss configs” or similar.
    
    But it’s also not quite clear to me how relevant that subpoint is, just sharing my impression.
    - habryka 19 Apr 2023 16:39 UTC
      6 points
      0
      Parent
      IMO this seems like a strawman. When talking to MIRI people it’s pretty clear they have thought a good amount about the inductive biases of SGD, including an associated simplicity prior.
      - jacob_cannell 19 Apr 2023 16:45 UTC
        6 points
        1
        Parent
        Sure it will clearly be a strawman for some individuals—the point of my comment is to explain how someone like myself could potentially misinterpret Bensinger and why. (As I don’t know him very well, my brain models him as a generic MIRI/LW type)
        dxu 19 Apr 2023 20:19 UTC
        10 points
        15
        Parent
        I want to revisit what Rob actually wrote:
        
        If you sampled a random plan from the space of all writable plans (weighted by length, in any extant formal language), and all we knew about the plan is that executing it would successfully achieve some superhumanly ambitious technological goal like “invent fast-running whole-brain emulation”, then hitting a button to execute the plan would kill all humans, with very high probability.
        
        (emphasis mine)
        
        That sounds a whole lot like it’s invoking a simplicity prior to me!
        jacob_cannell 20 Apr 2023 18:17 UTC
        14 points
        7
        Parent
        Note I didn’t actually reply to that quote. Sure that’s an explicit simplicity prior. However there’s a large difference under the hood between using an explicit simplicity prior on plan length vs an implicit simplicity prior on the world and action models which generate plans. The latter is what is more relevant for intrinsic similarity to human though processes (or not).