Wei Dai comments on AI Alignment Open Thread August 2019

Wei Dai 8 Aug 2019 9:24 UTC
LW: 14 AF: 7
AF
Has anyone seen this argument for discontinuous takeoff before? I propose that there will be a discontinuity in AI capabilities at the time that the following strategy becomes likely to succeed:
1. Use hacking or phishing to take over a computing center belonging to someone else.
2. Expand self (i.e., the AI executing the current strategy) into the new computing center.
3. Repeat steps 1 & 2 on other computing centers (in increasing order of their security) using the increased capabilities of the expanded AI.
4. Defend self and figure out how to take over or neutralize the rest of the world.
The reason for the discontinuity is that this strategy is an all-or-nothing kind of thing. There is a threshold in the chance of success in taking over other people’s hardware, below which you’re likely to get caught and punished/destroyed before you take over the world (and therefore almost nobody attempts it, and the few who do just quickly get caught), and above which the above strategy becomes feasible.
- Kaj_Sotala 8 Aug 2019 18:31 UTC
  6 points
  Parent
  There’s previously been the “an AI could achieve a discontinuous takeoff by exploiting a security vulnerability to copy itself into lots of other computers” argument in at least Sotala 2012 (sect 4.1.) and Sotala & Yampolskiy 2015 (footnote 15), though those don’t explicitly mention the “use the additional capabilities to break into even more systems” part. (It seems reasonably implicitly there to me, but that might just be illusion of transparency speaking.)
- Jalex Stark 8 Aug 2019 22:09 UTC
  5 points
  Parent
  I think Bostrom uses the term “hardware overhang” in Superintelligence to point to a cluster of discontinuous takeoff scenarios including this one
  - Wei Dai 9 Aug 2019 2:51 UTC
    7 points
    Parent
    It seems to me that there’s a counter-argument available to the “hardware overhang” argument for discontinuous takeoff that doesn’t apply to the “hacking” argument, namely that for any AI that achieves a high level of capability by taking advantage of hardware overhang, there will be an AI that arrives a bit earlier and achieves a somewhat lower level of capability by taking advantage of the same hardware overhang (e.g., because it has somewhat worse algorithms, or somewhat less or lower quality training data). Unlike the “hacking” scenario, in the generic “hardware overhang” scenario, there’s not an apparent threshold effect that could cause a discontinuity.
    
    (Curiously, Paul Christiano’s and AI Impacts’s posts arguing against discontinuous takeoff both ignore “hardware overhang” and neither give this counter-argument. Neither of them mention the “hacking” argument either, AFAICT.)
    - Kaj_Sotala 9 Aug 2019 10:23 UTC
      4 points
      Parent
      Wasn’t hardware overhang the argument that if AGI is more bottlenecked by software than hardware, then conceptual insights on the software side could cause a discontinuity as people suddenly figured out how to use that hardware effectively? I’m not sure how your counterargument really works there, since the AI that arrives “a bit earlier” either precedes or follows that conceptual breakthrough. If it precedes the breakthrough, then it doesn’t benefit from that conceptual insight so won’t be powerful enough to take advantage of the overhang, and if it follows it, then it has a discontinuous advantage over previous systems and can take advantage of hardware overhang.
      ---
      Separately, your comment also feels related to my argument that focusing on just superintelligence is a useful simplifying assumption, since a superintelligence is almost by definition capable of taking over the world. But it simplifies things a little too much, because if we focus too much on just the superintelligence case, we might miss the emergence of a “dumb” AGI which nevertheless had the “crucial capabilities” necessary for a world takeover.
      In those terms, “having sufficient offensive cybersecurity capability that a hacking attempt can snowball into a world takeover” would be one such crucial capability that allowed for a discontinuity.
- David Scott Krueger (formerly: capybaralet) 15 Aug 2019 4:07 UTC
  LW: 2 AF: 1
  AF Parent
  Yes.
  Not a direct response: It’s been argued (e.g. I think Paul said this in his 2nd 80k podcast interview?) that this isn’t very realistic, because the low-hanging fruit (of easy to attack systems) is already being picked by slightly less advanced AI systems. This wouldn’t apply if you’re *already* in a discontinuous regime (but then it becomes circular).
  Also not a direct response: It seems likely that some AIs will be much more/less cautious than humans, because they (e.g. implicitly) have very different discount rates. So AIs might take very risky gambles, which means both that we might get more sinister stumbles (good thing), but also that they might readily risk the earth (bad thing).
- Jess Smith 9 Aug 2019 17:53 UTC
  1 point
  Parent
  I wonder how plausible it is that the AI would be able to take over a second computing center before being detected in the first. (Which would then presumably be shut down)