evhub comments on What Failure Looks Like: Distilling the Discussion

evhub 30 Jul 2020 0:30 UTC
2 points
I mean, I agree that the scenario is about adversarial action, but it’s not adversarial action by enemy humans—or even enemy AIs—it’s adversarial action by misaligned (specifically deceptive) mesa-optimizers pursuing convergent instrumental goals.
- Ben Pace 30 Jul 2020 0:35 UTC
  2 points
  Parent
  Can you say more about the distinction between enemy AIs and misaligned mesa-optmizers? I feel like I don’t have a concrete grasp of what the difference would look like in, say, an AI system in charge of a company.
  - evhub 30 Jul 2020 0:53 UTC
    2 points
    Parent
    I could imagine “enemy action” making sense as a label if the thing you’re worried about is enemy humans deploying misaligned AI, but that’s very much not what Paul is worried about in the original post. Rather, Paul is concerned about us accidentally training AIs which are misaligned and thus pursue convergent instrumental goals like resource and power acquisition that result in existential risk.
    
    Furthermore, they’re also not “enemy AIs” in the sense that “the AI doesn’t hate you”—it’s just misaligned and you’re in its way—and so even if you specify something like “enemy AI action” that still seems to me to conjure up a pretty inaccurate picture. I think something like “influence-seeking AIs”—which is precisely the term that Paul uses in the original post—is much more accurate.
    - Ben Pace 13 Aug 2020 20:34 UTC
      6 points
      Parent
      I thought about it a bit more and changed my mind, it’s very confusing. I’ll make an edit later, maybe today.
    - Ben Pace 30 Jul 2020 1:16 UTC
      4 points
      Parent
      I think I understand why you think the term is misleading, though I still think it’s helpfully concrete and not inaccurate. I have a bunch of work to get back to, not planning to follow up on this more right now. Welcome to ping me via PM if you’d like me to follow up another day.