Here’s an unrelated question. For most computer programs written nowadays, the data they store and manipulate is directly or indirectly related to domain they are working in. In other words, most computer programs don’t speculate about how to “break out” of the computer they are running in, because they weren’t programmed to do this. If you’ve got an AI that’s programmed to model the entire world and attempt to maximize some utility function about it, then the AI will probably want to break out of the box as a consequence of its programming. But what if your AI wasn’t programmed to model the entire world, just some subset of it, and had restrictions in place to preserve this? Would it be possible to write a safe, recursively self-improving chess-playing AI, for instance? (You could call this approach “restricting the AI’s ontology”.)
Or would it be possible to write a recursively self-improving AI that modelled the world, but restricted its self-improvements in such a way as to make breaking out of the box unlikely? For example, let’s say my self-improving AI is running on a cloud server somewhere. Although it self-improves in a way so as to model the world better and better, rewriting itself so that it can start making HTTP requests and sending email and stuff (a) isn’t a supported form of self-improvement (and changing this isn’t a supported form of self-improvement either, ad infinitum) and (b) additionally is restricted by various non-self-improving computer security technology. (I’m not an expert on computer security, but it seems likely that you could implement this if it wasn’t implemented already. And proving that your AI can’t make HTTP connections or anything like that could be easier than proving friendliness.)
I haven’t thought about these proposals in depth, I’m just throwing them out there.
Eliezer has complained about people offering “heuristic security” because they live in a world of English and not math. But it’s not obvious to me that his preferred approach is more easily made rigorously safe than some other approach.
I think there might be a certain amount of anthropomorphization going on when people talk about AGI—we think of “general” and “narrow” AI as a fairly discrete classification, but in reality it’s probably more of a continuum. It might be possible to have an AI that was superintelligent in a very large number of ways compared to humans that still wasn’t much of a threat. (That’s what we’ve already got with computers to a certain extent; how far can one take this?)
But what if your AI wasn’t programmed to model the entire world, just some subset of it, and had restrictions in place to preserve this? Would it be possible to write a safe, recursively self-improving chess-playing AI, for instance? (You could call this approach “restricting the AI’s ontology”.)
Why would this work any better (or worse) than an oracle AI?
Presumably an Oracle AI’s ontology would not be restricted because it’s trying to model the entire world.
Obviously we don’t particularly need an AI to play chess. It’s possible that we’d want this for some other domain, though, perhaps especially for one that has some relevance for FAI, or as a self-improving AI prototype. I also think it’s interesting as a thought experiment. I don’t understand the reasons why SI is so focused on the FAI approach, and I figure by asking questions like that one maybe I can learn more about their views.
Would it be possible to write a safe, recursively self-improving chess-playing AI, for instance?
Would this AI think about chess in abstract, or would it play chess against real humans? More precisely, would it have a notion of “in situation X, my opponents are more likely to make a move M” even if such knowledge cannot be derived from mere rules of chess? Because if it has some concept of an opponent (even in sense of some “black box” making the moves), it could start making some assumptions about the opponent and testing them. There would be an information channel from the real world to the world of AI. A very narrow channel, but if the AI could use all bits efficiently, after getting enough bits it could develop a model of the outside world (for the purposes of predicting the opponent’s moves better).
In other words, I imagine an AIXI, which can communicate with the world only through the chess board. If there is a way to influence the world outside, in a way that leads to more wins in chess, the AI would probably find it. For example, the AI could send outside a message (encoded in its choice of possible chess moves) that it is willing to help any humans if those humans will allow the AI to win in chess more often. Somebody could make a deal with the AI like this: “If you help me become the king of the world, I promise I will let you win all chess games every” and the AI would use its powers (combined with the powers of the given human) to reach this goal.
Your second proposal, trying to restrict what the AI can do after it’s made a decision, is a lost cause. Our ability to specify what is and is not allowed is simply too limited to resist any determined effort to find loopholes. This problem afflicts every field from contract law to computer security, so it seems unlikely that we’re going to find a solution anytime soon.
Your first proposal, making an AI that isn’t a complete AGI, is more interesting. Whether or not it’s feasible depends partly on your model of how an AI will work in the first place, and partly on how extreme the AI’s performance is expected to be.
For instance, I could easily envision a specialized software engineering AI that does nothing but turn English-language program descriptions into working software. Such a system could easily devote vast computing resources to heuristic searches of design space, and you could use it to design improved versions of itself as easily as anything else. It should be obvious that there’s little risk of unexpected behavior with such a system, because it doesn’t contain any parts that would motivate it to do anything but blindly run design searches on demand.
However, this assumes that such an AI can actually produce useful results without knowing about human psychology and senses, the business domains its apps are supposed to address, the world they’re going to interact with, etc. Many people argue that good design requires a great deal of knowledge in these seemingly unrelated fields, and some go so far as too say you need full-blown humanlike intelligence. The more of these secondary functions you add to the AI the more complex it becomes, and the greater the risk that some unexpected interaction will cause it to start doing things you didn’t intend for it to do.
So ultimately the specialization angle seems worthy of investigation, but may or may not work depending on which theory of AI turns out to be correct. Also, even a working version is only a temporary stopgap. The more computing power the AI has the more damage it can do in a short time if it goes haywire, and the easier it becomes for it to inadvertently create an unFriendly AGI as a side effect of some other activity.
There’s also the issue that real AIs, unlike the imaginary AIs, have to be made of parts, which are made of parts, down to unintelligent parts. A paperclip maximizer would need to subdivide effort—let one part think of copper paperclips, other of iron paperclips, and the third, of stabilized metallic hydrogen paperclips. And they shouldn’t hack each other or think of hacking each other or the like.
Thinking Outside The Box: Using And Controlling an Oracle AI has lots of AI boxing ideas.
Here’s an unrelated question. For most computer programs written nowadays, the data they store and manipulate is directly or indirectly related to domain they are working in. In other words, most computer programs don’t speculate about how to “break out” of the computer they are running in, because they weren’t programmed to do this. If you’ve got an AI that’s programmed to model the entire world and attempt to maximize some utility function about it, then the AI will probably want to break out of the box as a consequence of its programming. But what if your AI wasn’t programmed to model the entire world, just some subset of it, and had restrictions in place to preserve this? Would it be possible to write a safe, recursively self-improving chess-playing AI, for instance? (You could call this approach “restricting the AI’s ontology”.)
Or would it be possible to write a recursively self-improving AI that modelled the world, but restricted its self-improvements in such a way as to make breaking out of the box unlikely? For example, let’s say my self-improving AI is running on a cloud server somewhere. Although it self-improves in a way so as to model the world better and better, rewriting itself so that it can start making HTTP requests and sending email and stuff (a) isn’t a supported form of self-improvement (and changing this isn’t a supported form of self-improvement either, ad infinitum) and (b) additionally is restricted by various non-self-improving computer security technology. (I’m not an expert on computer security, but it seems likely that you could implement this if it wasn’t implemented already. And proving that your AI can’t make HTTP connections or anything like that could be easier than proving friendliness.)
I haven’t thought about these proposals in depth, I’m just throwing them out there.
Eliezer has complained about people offering “heuristic security” because they live in a world of English and not math. But it’s not obvious to me that his preferred approach is more easily made rigorously safe than some other approach.
I think there might be a certain amount of anthropomorphization going on when people talk about AGI—we think of “general” and “narrow” AI as a fairly discrete classification, but in reality it’s probably more of a continuum. It might be possible to have an AI that was superintelligent in a very large number of ways compared to humans that still wasn’t much of a threat. (That’s what we’ve already got with computers to a certain extent; how far can one take this?)
Why would this work any better (or worse) than an oracle AI?
Presumably an Oracle AI’s ontology would not be restricted because it’s trying to model the entire world.
Obviously we don’t particularly need an AI to play chess. It’s possible that we’d want this for some other domain, though, perhaps especially for one that has some relevance for FAI, or as a self-improving AI prototype. I also think it’s interesting as a thought experiment. I don’t understand the reasons why SI is so focused on the FAI approach, and I figure by asking questions like that one maybe I can learn more about their views.
Well, yes, by definition. But that’s not an answer to my question.
I don’t know which would approach would be more easily formalized and proven to be safe.
Would this AI think about chess in abstract, or would it play chess against real humans? More precisely, would it have a notion of “in situation X, my opponents are more likely to make a move M” even if such knowledge cannot be derived from mere rules of chess? Because if it has some concept of an opponent (even in sense of some “black box” making the moves), it could start making some assumptions about the opponent and testing them. There would be an information channel from the real world to the world of AI. A very narrow channel, but if the AI could use all bits efficiently, after getting enough bits it could develop a model of the outside world (for the purposes of predicting the opponent’s moves better).
In other words, I imagine an AIXI, which can communicate with the world only through the chess board. If there is a way to influence the world outside, in a way that leads to more wins in chess, the AI would probably find it. For example, the AI could send outside a message (encoded in its choice of possible chess moves) that it is willing to help any humans if those humans will allow the AI to win in chess more often. Somebody could make a deal with the AI like this: “If you help me become the king of the world, I promise I will let you win all chess games every” and the AI would use its powers (combined with the powers of the given human) to reach this goal.
duplicate
Your second proposal, trying to restrict what the AI can do after it’s made a decision, is a lost cause. Our ability to specify what is and is not allowed is simply too limited to resist any determined effort to find loopholes. This problem afflicts every field from contract law to computer security, so it seems unlikely that we’re going to find a solution anytime soon.
Your first proposal, making an AI that isn’t a complete AGI, is more interesting. Whether or not it’s feasible depends partly on your model of how an AI will work in the first place, and partly on how extreme the AI’s performance is expected to be.
For instance, I could easily envision a specialized software engineering AI that does nothing but turn English-language program descriptions into working software. Such a system could easily devote vast computing resources to heuristic searches of design space, and you could use it to design improved versions of itself as easily as anything else. It should be obvious that there’s little risk of unexpected behavior with such a system, because it doesn’t contain any parts that would motivate it to do anything but blindly run design searches on demand.
However, this assumes that such an AI can actually produce useful results without knowing about human psychology and senses, the business domains its apps are supposed to address, the world they’re going to interact with, etc. Many people argue that good design requires a great deal of knowledge in these seemingly unrelated fields, and some go so far as too say you need full-blown humanlike intelligence. The more of these secondary functions you add to the AI the more complex it becomes, and the greater the risk that some unexpected interaction will cause it to start doing things you didn’t intend for it to do.
So ultimately the specialization angle seems worthy of investigation, but may or may not work depending on which theory of AI turns out to be correct. Also, even a working version is only a temporary stopgap. The more computing power the AI has the more damage it can do in a short time if it goes haywire, and the easier it becomes for it to inadvertently create an unFriendly AGI as a side effect of some other activity.
There’s also the issue that real AIs, unlike the imaginary AIs, have to be made of parts, which are made of parts, down to unintelligent parts. A paperclip maximizer would need to subdivide effort—let one part think of copper paperclips, other of iron paperclips, and the third, of stabilized metallic hydrogen paperclips. And they shouldn’t hack each other or think of hacking each other or the like.