Rohin Shah comments on The theory-practice gap

Rohin Shah 21 Sep 2021 7:42 UTC
LW: 4 AF: 4
AF
There are lots and lots of exotic circumstances. We might get into a nuclear war. We might invent time travel. We might become digital uploads. We might decide democracy was a bad idea.
I agree that AGI will create exotic circumstances. But not all exotic circumstances will be created by AGI. I find it plausible that the AI systems fail in only a special few exotic circumstances, which aren’t the ones that are actually created by AGI.
- Edouard Harris 22 Sep 2021 12:55 UTC
  LW: 3 AF: 3
  AF Parent
  Got it, thanks!
  I find it plausible that the AI systems fail in only a special few exotic circumstances, which aren’t the ones that are actually created by AGI.
  This helps, and I think it’s the part I don’t currently have a great intuition for. My best attempt at steel-manning would be something like: “It’s plausible that an AGI will generalize correctly to distributions which it is itself responsible for bringing about.” (Where “correctly” here means “in a way that’s consistent with its builders’ wishes.”) And you could plausibly argue that an AGI would have a tendency to not induce distributions that it didn’t expect it would generalize correctly on, though I’m not sure if that’s the specific mechanism you had in mind.
  - Rohin Shah 22 Sep 2021 13:59 UTC
    LW: 4 AF: 4
    AF Parent
    It’s nothing quite so detailed as that. It’s more like “maybe in the exotic circumstances we actually encounter, the objective does generalize, but also maybe not; there isn’t a strong reason to expect one over the other”. (Which is why I only say it is plausible that the AI system works fine, rather than probable.)
    You might think that the default expectation is that AI systems don’t generalize. But in the world where we’ve gotten an existential catastrophe, we know that the capabilities generalized to the exotic circumstance; it seems like whatever made the capabilities generalize could also make the objective generalize in that exotic circumstance.
    - Edouard Harris 23 Sep 2021 14:12 UTC
      LW: 3 AF: 3
      AF Parent
      I see. Okay, I definitely agree that makes sense under the “fails to generalize” risk model. Thanks Rohin!