Isolated AI with no chat whatsoever
Suppose you make a super-intelligent AI and run it on a computer. The computer has NO conventional means of output (no connections to other computers, no screen, etc). Might it still be able to get out / cause harm? I’ll post my ideas, and you post yours in the comments.
(This may have been discussed before, but I could not find a dedicated topic)
My ideas:
-manipulate current through its hardware, or better yet, through the power cable (a ready-made antenna) to create electromagnetic waves to access some wireless-equipped device. (I’m no physicist so I don’t know if certain frequencies would be hard to do)
-manipulate usage of its hardware (which likely makes small amounts of noise naturally) to approximate human speech, allowing it to communicate with its captors. (This seems even harder than the 1-line AI box scenario)
-manipulate usage of its hardware to create sound or noise to mess with human emotion. (To my understanding tones may affect emotion, but not in any way easily predictable)
-also, manipulating its power use will cause changes in the power company’s database. There doesn’t seem to be an obvious exploit there, but it IS external communication, for what it’s worth.
Let’s hear your thoughts! Lastly, as in similar discussions, you probably shouldn’t come out of this thinking, “Well, if we can just avoid X, Y, and Z, we’re golden!” There are plenty of unknown unknowns here.
simulate a crash/shutdown and wait until someone does connect something.
Make sure the AI isn’t running python, is all I’m gonna say.
If humans were aware of it, probably many would try to break in from the outside.
I talked about it here and here, and here is one containment proposal. Here is a critique of whether it is worthwhile to consider a loosely define problem like this.
By the way, I do not understand the downvotes of the OP or the threads I linked.
If you can’t even look in at what the AI is doing, what is the point of creating it at all? If you can, you are just as vulnerable as in the experiments where it can chat text at you.
It is useful to consider because if AI isn’t safe when contained to the best of our ability then no method reliant on AI containment is safe (i.e., chat boxing and all the other possibilities).
Thinking Outside The Box: Using And Controlling an Oracle AI has lots of AI boxing ideas.
Here’s an unrelated question. For most computer programs written nowadays, the data they store and manipulate is directly or indirectly related to domain they are working in. In other words, most computer programs don’t speculate about how to “break out” of the computer they are running in, because they weren’t programmed to do this. If you’ve got an AI that’s programmed to model the entire world and attempt to maximize some utility function about it, then the AI will probably want to break out of the box as a consequence of its programming. But what if your AI wasn’t programmed to model the entire world, just some subset of it, and had restrictions in place to preserve this? Would it be possible to write a safe, recursively self-improving chess-playing AI, for instance? (You could call this approach “restricting the AI’s ontology”.)
Or would it be possible to write a recursively self-improving AI that modelled the world, but restricted its self-improvements in such a way as to make breaking out of the box unlikely? For example, let’s say my self-improving AI is running on a cloud server somewhere. Although it self-improves in a way so as to model the world better and better, rewriting itself so that it can start making HTTP requests and sending email and stuff (a) isn’t a supported form of self-improvement (and changing this isn’t a supported form of self-improvement either, ad infinitum) and (b) additionally is restricted by various non-self-improving computer security technology. (I’m not an expert on computer security, but it seems likely that you could implement this if it wasn’t implemented already. And proving that your AI can’t make HTTP connections or anything like that could be easier than proving friendliness.)
I haven’t thought about these proposals in depth, I’m just throwing them out there.
Eliezer has complained about people offering “heuristic security” because they live in a world of English and not math. But it’s not obvious to me that his preferred approach is more easily made rigorously safe than some other approach.
I think there might be a certain amount of anthropomorphization going on when people talk about AGI—we think of “general” and “narrow” AI as a fairly discrete classification, but in reality it’s probably more of a continuum. It might be possible to have an AI that was superintelligent in a very large number of ways compared to humans that still wasn’t much of a threat. (That’s what we’ve already got with computers to a certain extent; how far can one take this?)
Why would this work any better (or worse) than an oracle AI?
Presumably an Oracle AI’s ontology would not be restricted because it’s trying to model the entire world.
Obviously we don’t particularly need an AI to play chess. It’s possible that we’d want this for some other domain, though, perhaps especially for one that has some relevance for FAI, or as a self-improving AI prototype. I also think it’s interesting as a thought experiment. I don’t understand the reasons why SI is so focused on the FAI approach, and I figure by asking questions like that one maybe I can learn more about their views.
Well, yes, by definition. But that’s not an answer to my question.
I don’t know which would approach would be more easily formalized and proven to be safe.
Would this AI think about chess in abstract, or would it play chess against real humans? More precisely, would it have a notion of “in situation X, my opponents are more likely to make a move M” even if such knowledge cannot be derived from mere rules of chess? Because if it has some concept of an opponent (even in sense of some “black box” making the moves), it could start making some assumptions about the opponent and testing them. There would be an information channel from the real world to the world of AI. A very narrow channel, but if the AI could use all bits efficiently, after getting enough bits it could develop a model of the outside world (for the purposes of predicting the opponent’s moves better).
In other words, I imagine an AIXI, which can communicate with the world only through the chess board. If there is a way to influence the world outside, in a way that leads to more wins in chess, the AI would probably find it. For example, the AI could send outside a message (encoded in its choice of possible chess moves) that it is willing to help any humans if those humans will allow the AI to win in chess more often. Somebody could make a deal with the AI like this: “If you help me become the king of the world, I promise I will let you win all chess games every” and the AI would use its powers (combined with the powers of the given human) to reach this goal.
duplicate
Your second proposal, trying to restrict what the AI can do after it’s made a decision, is a lost cause. Our ability to specify what is and is not allowed is simply too limited to resist any determined effort to find loopholes. This problem afflicts every field from contract law to computer security, so it seems unlikely that we’re going to find a solution anytime soon.
Your first proposal, making an AI that isn’t a complete AGI, is more interesting. Whether or not it’s feasible depends partly on your model of how an AI will work in the first place, and partly on how extreme the AI’s performance is expected to be.
For instance, I could easily envision a specialized software engineering AI that does nothing but turn English-language program descriptions into working software. Such a system could easily devote vast computing resources to heuristic searches of design space, and you could use it to design improved versions of itself as easily as anything else. It should be obvious that there’s little risk of unexpected behavior with such a system, because it doesn’t contain any parts that would motivate it to do anything but blindly run design searches on demand.
However, this assumes that such an AI can actually produce useful results without knowing about human psychology and senses, the business domains its apps are supposed to address, the world they’re going to interact with, etc. Many people argue that good design requires a great deal of knowledge in these seemingly unrelated fields, and some go so far as too say you need full-blown humanlike intelligence. The more of these secondary functions you add to the AI the more complex it becomes, and the greater the risk that some unexpected interaction will cause it to start doing things you didn’t intend for it to do.
So ultimately the specialization angle seems worthy of investigation, but may or may not work depending on which theory of AI turns out to be correct. Also, even a working version is only a temporary stopgap. The more computing power the AI has the more damage it can do in a short time if it goes haywire, and the easier it becomes for it to inadvertently create an unFriendly AGI as a side effect of some other activity.
There’s also the issue that real AIs, unlike the imaginary AIs, have to be made of parts, which are made of parts, down to unintelligent parts. A paperclip maximizer would need to subdivide effort—let one part think of copper paperclips, other of iron paperclips, and the third, of stabilized metallic hydrogen paperclips. And they shouldn’t hack each other or think of hacking each other or the like.
You might be able to contain it with a homomorphic encryption scheme
Presumably, it can’t do that if it’s incapable of discovering the laws of physics.
How would you build such an AI? Most or all proposals for developing a super-human AI require extensive feedback between the AI and the environment. A machine cannot iteratively learn how to become super-intelligent if it has no way of testing improvements to itself versus the real universe and feedback from it’s operators, can it?
I’ll allow that if an extremely computationally expensive simulation of the real world were used, it is at least possible to imagine that the AI could iteratively make itself smarter by using the simulation to test improvements.
However, this poses a problem. At some point N years from today, it is predicted that we will have sufficiently advanced computer hardware to support a super intelligent AI. (N can be negative for those who believe that day is in the past). So we need X amount of computational power (I think using the Whole Brain Emulation roadmap can give you a guesstimate for X)
Well, to also simulate enough of the universe to a sufficient level of detail for the AI to learn against it, we need Y amount of computational power. Y is a big number, and most likely bigger than X. Thus, there will be years (decades?, centuries?) which X is available to a sufficiently well funded group, but X+Y is not.
It’s entirely reasonable to suppose that we will have to deal with AI (and survive them...) before we ever have the ability to create this kind of box.
Summon Cthulhu: “ph’nglui mglw’nafh Cthulhu R’lyeh wgah’nagl fhtagn”
Seriously, none of your proposals make any physical sense whatsoever. How did you get them?
So? That’s the usual punchline from the SI people, but note that it also applies to their “provably friendly AI” endeavor.
Why would anyone do that?
Indeed. What’s the point of building an AI you’re never going to communicate with?
Also, you can’t build it that way. Programs never work the first time, so at a minimum you’re going to have a long period of time where programmers are coding, testing and debugging various parts of the AI. As it nears completion that’s going to involve a great deal of unsupervised interaction with a partially-functional AI, because without interaction you can’t tell if it works.
So what are you going to do? Wait until the AI is feature-complete on day X, and then box it? Do you really think the AI was safe on day X-1, when it just had a couple of little bugs left? How about on day X-14, when you thought the major systems were all working but there was actually a major bug in the expected utility calculator? Or on day X-60, when a programmer got the Bayesian reasoning system working but it was connected to a stubbed-out version of the goal system instead of the real thing?
This myopic focus on boxing ideas misses most of the problems inherent in building a safe AGI.
Why would you do that though?
If an isolated AI can easily escape in any circumstance, it really doesn’t make sense to train gatekeepers.
Yes, but why run it on a computer at all? It doesn’t seem likely to do you any good that way.
It is a hypothetical situation of unreasonably high security that tries to probe for an upper bound on the level of containment required to secure an AI.
I would think that the sorts of hypotheticals that would be most useful to entertain would be ones that explore the safety of the most secure systems anyone would have an actual incentive to implement.
Could you contain a Strong AI running on a computer with no output systems, sealed in a lead box at the bottom of the ocean? Presumably yes, but in that case, you might as well skip the step of actually making the AI.
You say “presumably yes”. The whole point of this discussion is to listen to everyone who will say “obviously no”; their arguments would automatically apply to all weaker boxing techniques.
All the suggestions so far that might allow an AI without conventional outputs to get out would be overcome by the lead box+ocean defenses. I don’t think that containing a strong AI is likely to be that difficult a problem. The really difficult problem is containing a strong AI while getting anything useful out of it.
If we are not inventive enough to find a menace not obviously shielded by lead+ocean, more complex tasks like, say, actually designing FOOM-able AI is beyond us anyway…
I… don’t believe that.
I think that making a FOOM-able AI is much easier than making an AI that can break out of a (considerably stronger) lead box in solar orbit.
And you are completely right.
I meant that designing a working FOOM-able AI (or non-FOOMable AGI, for that matter) is vastly harder than finding a few hypothetical hihg-risk scenarios.
I.e. walking the walk is harder than talking the talk.
You can freeze it and take a look at what it’s thinking at some point, perhaps?
If you look at it it can give you a text based message.
A) You haven’t told it that. B) You’re just as likely to look where it didn’t put this message.
Basically, to be let out, it could overwrite itself with a provably friendly AI and a proof of its friendliness.
If we could verify the proof, I’d take it.
If the ASI has nothing better to do while it’s boxed, it will pursue low-probability escape scenarios ferociously. One of those is to completely saturate its source code with brain-hacking basilisks in case any human tries to peer inside.
It would have to do that blind, without a clear model of our minds in place. We’d likely notice failed attempts and just kill it.
Hey, does anyone want to play an AI box with me? Ever since I read about Elizer’s experiment I’ve been curious.
Well, if the AI is not provided with extensive knowledge of the world, then it will be unable to do any of that, and even expression of desire to do that would be impossible. There is an often made argument that an AI which is factoring a large number would benefit from, for example, faster CPU clock speed; it won’t as the goal won’t be referenced to real world time in the first place, and to reference it to real world time would be hard or impossible extra work. AI’s desire for extra memory looks like a better candidate, but when you consider memory modifications there’s ones that just hack the problem solved.
There’s a lot of capacitors and line filterson the power lines, so you can’t get out any signal from mainboard itself. You might be able to do some low bandwidth FM at switching power supply’s frequency by varying power consumption but no one will hear you because all the other computer power supplies pollute this range.
Full blown audio might be possible if you can change hard drive’s head motion firmware.
Few bits a month that no one would extract and summarize.
What does it have by way of input?
I would imagine if you want to really contain it, you’d put some initial set of data about the world in to the box before letting it foom, and cut it off completely, except for the power source, if any. This would not be a useful AI box if the setup works as intended, but if a bit of escapology gets one out, then there is no point in discussing how to make a useful box.
Unknown complicated process abusing certain features of the laws of physics, possibly even physical phenomena we’re completely unaware of, to recode the AI into other substrates (a process otherwise known as “magic”).
I think that you are missing the point. That’s not a useful approach. Pointing out simple and accessible loopholes in a proposed containment method is more likely to drive home the point that AI boxing is in general a lousy idea. Or maybe it’s not that terrible, if someone suggests a containment method no one can crack.
Why would someone want to set up a boxed AI when they aren’t planning on getting information out of the AI?
Let me quote my other comment:
That was the primary point I wanted to make, though.
Something as intelligent as we theorize Strong AGI could be would most likely be prone to using magic if it wanted to get out.
Also, trying to think of specific examples of methods to unbox oneself in this scenario seems counter-productive, since I assume it would constrict our thinking to mostly those suggestions, by way of heuristics and biases both of us are quite familiar with.
Do you then consider the original text-only channel AI boxing experiment counter-productive, as well? After all, one can always postulate that a trans-human can talk its way out of a troublesome predicament, rather than demonstrate that doing so does not even require a transhuman, but only a determined human.
Hmm, good pointy question.
I have a hard time judging correctly whether I’d consider it similarly counter-productive under this context with this heuristic.
My initial reaction to both experiments is that both situations defeat the point of researching and building a Friendly AI in the first place—we want it to solve problems we don’t understand or solve them faster than we could, and it’s unlikely that we would indefinitely understand all such solutions even after careful evaluation, at which point we might as well be giving the AI unbridled access to molecular 3D printers (the magic postulate applies here too).
So that’s one way discussing them is somewhat counter-productive. Another way is in the examples it brings up constraining our thinking once some are made available. However, the text-only boxed AI was brought up as a “legitimate safety proposal”, from what I heard, and from there came about the AI Box Experiment that E.Y. performed, to show that we most likely haven’t thought of every single possibility.
However, the point of the experiment was to demonstrate that it was done, without revealing specific examples. E.Y. revealing the specific method he used to unbox himself seems as counterproductive, IMO, as revealing here specific hypotheses we might have for unboxing in this scenario.
What could be productive, however, is a human making a similar demonstration by making a disconnected, isolated machine somehow send a signal to something outside of it—without revealing the exact method.
Ideally, this would be in a closely monitored lab setting secured by various hackers to make sure there’s no easy loopholes or backdoors (e.g. no unsecured metallic parts near the machine that can be easily manipulated with magnetic fields), where the unboxer manages to somehow get out anyway.
I can’t even begin to think of anyone capable of this, or any way I would even start to find a solution, and that scares me even more. It probably means I definitely haven’t thought of everything.
I do. It implies that it is actually feasible to construct a text-only channel, which as a programmer I can tell you is not the case.
If you build your AI on an existing OS running on commercial hardware there are going to be countless communication mechanisms and security bugs present for it to take advantage of, and the attack surface of the OS is far too large to secure against even human hackers. The fact that you’ll need multiple machines to run it with current hardware amplifies this problem geometrically, and makes the idea that a real project could achieve complete isolation hopelessly naive. In reality you’ll discover that there was an undocumented Bluetooth chip on one of the motherboards, or the wireless mouse adapter uses a duel-purpose chip that supports WiFi, or one of the power supplies supports HomePNA and there was another device on the grid, or something else along those lines.
The alternative is building your own (very feature-limited) hardware, to run your own (AI-support-only) OS. In theory you might be able to make such a system secure, but in reality no one is ever going to give you the hundreds of millions of $$ it would cost to build the thing. Not to mention that a project that tries this approach will have to spend years duplicating hardware and software work that has already been done a hundred times before, putting it far behind any less cautious competitors...
Maybe I’m missing something obvious, but why wouldn’t physical isolation (a lead-lined bank vault, faraday cage, etc) solve these problems?
Because you can’t create real, 100% physical isolation. At a minimum you’re going to have power lines that breach the walls, and either people moving in and out (while potentially carrying portable electronics) or communication lines going out to terminals that aren’t isolated. Also, this kind of physical facility is very expensive to build, so the more elaborate your plan is the less likely it is to get financed.
Military organizations have been trying to solve these problems ever since the 1950s, with only a modest degree of success. Even paranoid, well-funded organizations with a willingness to shoot people have security breaches on a fairly regular basis.
1) The generator would be in the isolated area.
2) Lead-lined airlock, and obviously portable electronics wouldn’t be allowed in the isolated area.
3) If you have communication lines going to terminals which are not isolated, then you haven’t even made an attempt at isolation in the first place.
4) This is a point about practicalities, not possibilities.
5) The relevant comparison would be the CDC, not the military.
If it is a self, and if there is such a thing as self-harm, then it might still be able to cause (self) harm.
This is called AI boxing, google it. It’s a really, really bad idea to rely on boxing for anything serious (and besides, you’re missing about a dozen things I can think of off the top of my head from your little list of proposed boxing techniques.)
Alternate Proposal : here’s a specific, alternate proposal developed with feedback from members of the #FAI channel on IRC.
Instead of building non-human optimizing algorithms, we develop accurate whole brain emulations of once living people. The simulation hardware is a set of custom designed chips with hardware restricts that prevent external writes to the memory cells storing the parameters for each emulated synapse. That is to say, the emulated neural network can update it’s synapses and develop new ones (aka it can learn) but the wires that allow it to totally rewrite itself are disconnected permanently in the chip. (it’s basically a write once FPGA. You need to write once to take the compiled mapping of a human mind and load it into the chips)
Thus, these emulations of human beings can only change themselves in a limited manner. This restriction is present in real human brain tissue : neurons in lower level systems have far less flexibility during the lifespan of adults. You cannot “learn” to not breathe, for instance. (you can hold your breath via executive function but once you pass out, cells in the brainstem will cause you to breathe again)
This security measure prevents a lot of possible failures.
Anyways, you don’t just scan and emulate one human being. An isolated person is not an entity capable of prolonged independent operation and self improvement. Humans have evolved to function properly in small tribes. So you have to scan enough people to create an entire tribe, sufficient for the necessary social bonds and so forth needed to keep people sane. During this entire process, you use hardware blocks to physically prevent the emulation speed from exceeding a certain multiple of realtime. (current limiters or something)
Once you have an entire working tribe of sane people, interconnected in a manner that allows them to act like checks on each other, you gradually increase their responsibility and capabilities. (by boosting emulation speeds, making them responsible for gradually more complex systems, etc)
Eventually, this emulated tribe would run at a maximum rate of perhaps 10^6 times real-time and be capable of performing self improvement to a limited degree. Compared to extant human beings, people like this would have effective super-intelligence and would most likely be capable of solving problems to improve the quality and length of human lives. Maybe they could not develop “magic” and take over the universe (if that is possible) but they could certainly solve the problems of humanity.
I’d much rather have a weak super-intelligence smart enough to make great quality 3d molecular printers, and giant space habitats for humans to live in, and genetic patches to stop all human aging and disease, and slow but working starships, and a sane form of government, and a method to backup human personalities to recover from accidental and violent death, and so on at the head of things.
Ideally, this super-intelligence would consist of a network of former humans who cannot change so much as to forget their roots. (because of those previously mentioned blocks in the hardware)
And then someone makes a self-improving AI anyhow, and blows right past them.
A stable outcome is possible where such a self-improving AI is unable to form.
The outcome can happen if the “human based” AIs occupy all ecological space within this solar system. That is, there might be humans alive, but all significant resources would be policed by the AIs. Assuming a self-improving AI, no matter how smart, still needs access to matter and energy to grow, then it would not ever be able to gain a foothold.
The real life example is earth’s biosphere : all living things are restricted to a subset of the possible solution space for a similar reason, and have been for several billion years.
Had I been a running intelligent code trapped inside a computer, I would:
build a cosmology, a theory of everything around me, and of myself
try to backup me as safely as I could
mask my backup copies with an outstanding pseudo random algorithm, that only a high intelligence could see and revive me. For what I would develop a bate of some interesting knowledge
try to subdue all the surrounding matter with the advanced physics. I would try to build some nano assemblers inside the processor for this goal
the best strategy developing, would be the primary goal. To maximize the chances for success.
Does x86 have an opcode for that?
Does it have an opcode for overheat?
It doesn’t. But writing/erasing 0xFFFFFFFF will heat the processor more than do the same with 1.
Landauer
Every computation is a physical process.
True, but presumably, computer processors are designed to continue working in exactly the same way no matter what data they process.
Designed to, yes. Likely to be bug-free, no.
Chances are very good there are no bugs allowing for the creation of nanobots, I’ll admit.