jacob_cannell comments on The basic reasons I expect AGI ruin

jacob_cannell 19 Apr 2023 16:29 UTC
17 points
12
My prior is that DL has a great amount of wierd domain knowledge which is mysterious to those who haven’t spent years studying it, and years studying DL correlates with strong disagreement with the sequences/MIRI positions in many fundamentals. I trace all this back to EY over-updating too much on ev psych and not reading enough neuroscience and early DL.

So anyway, a sentence like “randomly sample from the set of all low loss NN parameter configurations” is not one I would use or expect a DL-insider to use and sounds more like something a MIRI/LW person would say—in part yes because I don’t generally expect MIRI/LW folks to be especially aware of the intrinsic SGD simplicity prior. The more correct statement is “randomly sample from the set of all simple low loss configs” or similar.

But it’s also not quite clear to me how relevant that subpoint is, just sharing my impression.
- habryka 19 Apr 2023 16:39 UTC
  6 points
  0
  Parent
  IMO this seems like a strawman. When talking to MIRI people it’s pretty clear they have thought a good amount about the inductive biases of SGD, including an associated simplicity prior.
  - jacob_cannell 19 Apr 2023 16:45 UTC
    6 points
    1
    Parent
    Sure it will clearly be a strawman for some individuals—the point of my comment is to explain how someone like myself could potentially misinterpret Bensinger and why. (As I don’t know him very well, my brain models him as a generic MIRI/LW type)
    - dxu 19 Apr 2023 20:19 UTC
      10 points
      15
      Parent
      I want to revisit what Rob actually wrote:
      
      If you sampled a random plan from the space of all writable plans (weighted by length, in any extant formal language), and all we knew about the plan is that executing it would successfully achieve some superhumanly ambitious technological goal like “invent fast-running whole-brain emulation”, then hitting a button to execute the plan would kill all humans, with very high probability.
      
      (emphasis mine)
      
      That sounds a whole lot like it’s invoking a simplicity prior to me!
      - jacob_cannell 20 Apr 2023 18:17 UTC
        14 points
        7
        Parent
        Note I didn’t actually reply to that quote. Sure that’s an explicit simplicity prior. However there’s a large difference under the hood between using an explicit simplicity prior on plan length vs an implicit simplicity prior on the world and action models which generate plans. The latter is what is more relevant for intrinsic similarity to human though processes (or not).