Gordon Seidoh Worley comments on The Self-Unaware AI Oracle

Gordon Seidoh Worley Jul 24, 2019, 6:05 PM
2 points
Again, things that happen in Algorithm Land are also happening in the Real World, but the mapping is kinda arbitrary. High-impact things in Algorithm Land are not high-impact things in the Real World. For example, using RAM to send out manipulative radio signals is high-impact in the Real World, but just a random meaningless series of operations in Algorithm Land. Conversely, an ingeniously-clever chess move in Algorithm Land is just a random activation of transistors in the Real World.
It’s hard to be sure this separation will remain, though. An algorithm may accidentally hit upon unexpected techniques while learning like row-hammering or performing operations that cause the hardware to generate radio waves (as you point out) or otherwise behave in unexpected ways that may result in preferred outcomes by manipulating things in the “real world” outside the intended “algorithm land”.
For another example, I seem to recall a system that learned to win in a competitive environment by mallocing so much that it starved out its competitors running on the same system. It never knew about the real world consequences of its actions since it didn’t have access to know about other processes on the system, yet it carried out the behavior anyway. There are many other examples of this, and someone even collected them in a paper on arXiv, although I can’t seem to find the link now.
The point is that the separation between Algorithm Land and the Real World doesn’t exist except in our models. Even if you ran the algorithm on a computer with an air gap and placed the whole thing inside a Faraday cage, I’d still be concerned about unexpected leaks outside the sandbox of Algorithm Land into the Real World (maybe someone sneaks their phone in past security, and the optimizer learns to incidentally modify the fan on the computer it runs on to produce sounds that get exploit the phone’s microphone to transmit information to it? the possible failure scenarios are endless). Trying to maintain the separation you are looking for is known generally as “boxing” and although it’s likely an important part of a safe AI development protocol, many people, myself included, consider it inadequate on its own and not something we should rely on, but rather part of a security-in-depth approach.
- Steven Byrnes Jul 25, 2019, 1:29 AM
  3 points
  Parent
  OK, so I was saying here that software can optimize for something (e.g. predicting a string of bits on the basis of other bits) and it’s by default not particularly dangerous, as long as the optimization does not involve an intelligent foresight-based search through real-world causal pathways to reach the desired goal. My argument for this was (1) Such a system can do Level-1 optimization but not Level-2 optimization (with regards to real-world causal pathways unrelated to implementing the algorithm as intended), and (2) only the latter is unusually dangerous. From your response, it seems like you agree with (1) but disagree with (2). Is that right? If you disagree with (2), can you make up a scenario of something really bad and dangerous, something that couldn’t happen with today’s software, something like a Global Catastrophic Risk, that is caused by a future AI that is optimizing something but is not more specifically using a world-model to do an intelligent search through real-world causal pathways towards a desired goal?
  - Gordon Seidoh Worley Jul 30, 2019, 12:43 AM
    3 points
    Parent
    Sure. Let’s construct the 0-optimizer. Its purpose is simply to cause there to be lots of 0s in memory (as opposed to 1s). It only knows about Algorithm Land, and even then it’s a pretty narrow model: it knows about memory and can read and write to it. Now at some point the 0-optimizer manages to get all the bits set to 0 in its addressable memory, so it would seem to have reached maximum attainment.
    But it’s a hungry optimizer and keeps trying to find ways to set more bits to 0. It eventually stumbles upon a gap in security of the operating system that allows it to gain access to memory outside its address space, so it can now set those bits to 0. Obviously it does this all “accidentally”, never knowing it’s using a security exploit, it just stumbles into it and just sees memory getting written with 0s so it’s happy (this has plenty of precedent; human minds are great examples of complex systems that have limited introspective access that do lots of complex things without knowing how or why they are doing them). With some luck, it doesn’t immediately destroy itself and gets a chance to be hungry for more 0s.
    Next it accidentally starts using the network interface on the computer. Although it doesn’t exactly understand what’s going on, it figures out how to get responses that just contain lots of 0s. Unfortunately for us what this is actually doing is performing a denial of service attack against other computers to get back the 0s. Now we have a powerful optimization process that’s hungry for 0s and it satisfies its hunger by filling our networks with garbage traffic.
    Couple of hops on, it’s gone from denial of service attacks to wiping out our ability to use Internet service to our ability to use any EM communication channel to generating dangerously high levels of radiation that kill all life on Earth.
    This story involved a lot of luck, but my expectation is that we should not underestimate how “lucky” a powerful optimizer can be, given evolution is a similarly ontologically simple process that nonetheless managed to produce some pretty complex results.