There are many resources available. Many people use computers that are easy to hack and connected to Internet. The AI could start with hacking millions of PCs worldwide.
That’s not as easy as it sounds in English. People could notice it and bomb the AI. The global infrastructure is very fragile and not optimized for running a GAI.
Wait a minute, bomb the AI ? Assuming it took over a fraction of the Internet, that is bound to be very difficult, as you’d have to sever every computer it controls out the network. Yes, currently, the network is still over-centralized, and quite fragile. But if wireless mesh networking live up to its promises (I believe it will, in a decade or two), this won’t be the case any more.
As for the “not optimized for a Global AI” part, you could just have the AI split (or duplicate) itself, and let a gazillion sub-processes take care of the gazillion task is may want to perform. It probably doesn’t need more bandwidth that a human organization does now (except of course to copy itself).
Assuming it took over a fraction of the Internet...
I object to the “assuming” part. It needs to acquire resources to acquire resources. If it can’t take over the Internet with its initial resources then it won’t be able to make use of further resources.
...that is bound to be very difficult, as you’d have to sever every computer it controls out the network.
You assume that everyone computer can run a seed AI and that the global infrastructure is very stable under attack.
...you could just have the AI split (or duplicate) itself, and let a gazillion sub-processes take care of the gazillion task is may want to perform.
I object to the “assuming” part. It needs to acquire resources to acquire resources. If it can’t take over the Internet with its initial resources then it won’t be able to make use of further resources.
It’s likely very easy to take over enough computers (e.g. hundreds), so that (1) they can run an AI capable enough to work on taking over more computers, perhaps slowly and (2) it’s not possible to shut it down without severely disrupting the Internet, since you don’t know which computers are involved (if it’s not possible yet to run an AI on hundreds of computers, that capability is at most a few decades in the future, and at some point one computer might suffice). Provider/company networks may be big enough to run the AI and almost impossible to shut down in coordination all over the world. If AI has sense enough to avoid making any convincingly threatening moves, it won’t be possible to convince people to essentially disrupt the whole economy in an attempt to exterminate it (even better, nobody notices at all). If AI creates some relatively simple free-ranging backup viruses that re-assemble a working AI whenever they can (e.g. forming something like a decentralized p2p network that redundantly stores its data when AI can’t run), even shutting down all instances of AI in the world won’t cure the infection, it’ll come back whenever you restore the Internet or even local networks, letting any (enough of) previously infected computers in. And given enough time, the disease will fester.
People talk about computer security as though it’s an arms race where the smarter side always wins. This is just wrong. Once I’ve written a correct program (for some set of correctness properties), it’ll stay correct. If I have a secure operating system, it’ll still be secure no matter how smart the attacker is. This is somewhat beyond current industrial practice, but we have verified operating systems and compilers as research prototypes. We know how to write secure software today. We might not reliably achieve it, but it seems pretty much settled that it’s achievable without superhuman skill.
Wide area peer-to-peer isn’t a good platform for general computing; you have severe reliability and connectivity problems at the edge of the network. If you give me 100 random network-connected machines, it doesn’t give me 100 times the real computational power. I’m not sure it gives me 10x, for most problems of interest. In particular, my machine-learning colleagues tell me that their learning algorithms don’t parallelize well. Apparently, good learning algorithms need to combine results from examining different subsets of the data, and that’s intrinsically communication-intensive and therefore not efficient in parallel.
You could presumably write software to automatically craft exploits and use them to re-establish itself elsewhere. This would be a highly resource intensive and therefore non-stealthy process. All exploits only work on some subset of the machines out there; therefore, an attacker firing off attacks across the network will be highly visible. We have honeypots, internet telescopes, and suchlike today. I don’t think this process could be kept hidden now, and the defensive technology is steadily improving.
I’m not qualified to assess all possible AI-risk scenarios, but I think “the AI will take over all our computers” is overrated as a risk. That window is closing now, and given current trends I expect it to be closed within 10-15 years. I expect the generation-after-next operating systems to have the security-critical parts (and possibly most of the code) verified.
People talk about computer security as though it’s an arms race where the smarter side always wins.
Security is possible in principle (barring the cases like stupid/careless users manually launching content sent to them or found somewhere and granting it undue privileges), but very unlikely to become sufficiently reliable in practice anytime soon. At present, breaking into more and more computers is a matter of continuously applying some creative effort to the task, researching vulnerabilities and working around existing recognition-type defenses. In any case, earning money to buy additional computing power is similar for our purposes.
If you give me 100 random network-connected machines, it doesn’t give me 100 times the real computational power.
Yes. What matters is when several hundred (thousand) haphazardly connected computers is enough for the system to be capable enough to successfully work on its continued survival.
We have honeypots, internet telescopes, and suchlike today. I don’t think this process could be kept hidden now, and the defensive technology is steadily improving.
This is mildly plausible to succeed in permanently inhibiting stupid backup after AI is terminated by disrupting the Internet and most big networks. But it takes only one backup system, and there’s incentive to create many, with different restoration strategies.
And when only a few computers are sufficient to run an AI, all this becomes irrelevant, as it necessarily remains active somewhere.
Security is possible in principle… but very unlikely to become sufficiently reliable in practice anytime soon.
How soon is soon? I would bet on most systems not being vulnerable to remote exploits without user involvement within the next 10 years. I would not bet on dangerous self-improving AI within that timeframe.
Yes. What matters is when several hundred (thousand) haphazardly connected computers is enough for the system to be capable enough to successfully work on its continued survival.
Once the rogue-AI-in-the-net is slower at self-improvement than human civilization, it’s not so much of a threat. The world in which there’s a rogue-AI out there is probably also the world in which we have powerful-but-reliable automation for lots of human-controlled software development, too...
But it takes only one backup system, and there’s incentive to create many, with different restoration strategies.
And when only a few computers are sufficient to run an AI, all this becomes irrelevant, as it necessarily remains active somewhere.
This assumption strikes me as far-fetched. There presumably is some minimum quantity of code and data for the thing to be effective. It would be surprising if that subset fit on one machine, since that would imply that an effective self-modifying AI has low resource needs and that you can fit an effective natural-language processor into a memory much smaller than those used by today’s natural-language-processing systems.
By a few computers being sufficient I mean that computers become powerful enough, not that AI gets compressed (feasibility of which is less certain). Other contemporary AI tech won’t be competitive with rogue AI when we can’t solve FAI, because any powerful AI will in that case itself be a rogue AI and won’t be useful for defense (it might appear useful though).
Other contemporary AI tech won’t be competitive with rogue AI when we can’t solve FAI, because any powerful AI will in that case itself be a rogue AI and won’t be useful for defense.
“AI” is becoming a dangerously overloaded term here. There’s AI in the sense of a system that does human-like tasks as well as humans (Specialized artificial intelligence), and there’s AI in the sense of a highly-self-modifying system with long-range planning, AGI. I don’t know what “powerful” means in this context, but it doesn’t seem clear to me that humans + ASI can’t be competitive with an AGI.
And I am skeptical that there will be radical improvements in AGI without corresponding improvements to ASI. it might easily be the case that humans + ASI support for high-productivity software engineering are enough to build secure networked systems, even in the presence of AGI. I would bet on humans + proof systems + higher-level developer tools being able to build secure systems, before AGI becomes good enough to be dangerous.
By “powerful AI” I meant AGI (terminology seems to have drifted there in this thread). Humans+narrow AI might be powerful, but can’t become very powerful without AGI, while AGI in principle could. AGI could work on its own narrow AIs if that potentially helps.
You keep talking about security, but as I mentioned above, earning money works as well or probably better for accumulating power. Security was mostly relevant in the discussion of quickly infecting the world and surviving an (implausibly powerful) extermination attempt, which only requires being able to anonymously infect a few hundred or thousands of computers worldwide, which even with good overall security seems likely to remain possible (perhaps through user involvement alone, for example after the first wave that recruits enough humans).
I’m now imagining a story in which there’s a rogue AI out there with a big bank account (attained perhaps from insider trading), hiring human proxies to buy equipment, build things, and gradually accumulate power and influence, before, some day, deciding to turn the world abruptly into paperclips.
It’s an interesting science fiction story. I still don’t quite buy it as a high-probability scenario or one to lie awake worrying about. An AGI able to do this without making any mistakes is awfully far from where we are today. An AGI able to write an AGI able to do this, seems if anything to be a harder problem.
We know that the real world is a chaotic messy place and that most interesting problems are intractable. Any useful AGI or ASI is going to be heavily heuristic. There won’t be any correctness proofs or reliably shortcuts.Verifying that a proposed modification is an improvement is going to have to be based on testing, not just cleverness. I don’t believe you can construct a small sandbox and train an AGI in that sandbox, and then have it work well in the wider world. I think training and tuning an AGI means lots of involvement with actual humans, and that’s going to be a human-scale process.
If I did worry about the science fiction scenario above, I would look for ways to thwart it that also have high payoff if AGI doesn’t happen soon or isn’t particularly effective at first. I would think about ways to do high-assurance financial transparency and auditing. Likewise technical auditing and software security.
You keep talking about security, but as I mentioned above, earning money works as well or probably better for accumulating power.
But it is not easy to use the money. You can’t “just” build huge companies with fake identities, or a straw man, to create revolutionary technologies easily. Running companies with real people takes a lot of real-world knowledge, interactions and feedback. But most importantly, it takes a lot of time. I just don’t see that an AI could create a new Intel or Apple over a few years without its creators noticing anything.
The goals of an AI will be under scrutiny at any time. It seems very implausible that scientists, a company or the military are going to create an AI and then just let it run without bothering about its plans. An artificial agent is not a black box, like humans are, where one is only able to guess its real intentions. A plan for world domination seems like something that can’t be concealed from its creators. Lying is no option if your algorithms are open to inspection.
If AI has sense enough to avoid making any convincingly threatening moves, it won’t be possible to convince people to essentially disrupt the whole economy in an attempt to exterminate it (even better, nobody notices at all).
Could you elaborate on “even better, nobody notices at all”. Any AI capable of efficient self-modification must be able to grasp its own workings and make predictions about improvements to various algorithms and its overall decision procedure. If an AI can do that, why would the humans who build it be unable to notice any malicious intentions? Why wouldn’t the humans who created it not be able to use the same algorithms that the AI uses to predict what it will do? If humans are unable to predict what the AI will do, how is the AI able to predict what improved versions of itself will do?
In other words, could you elaborate on why you believe that what the AI is going to do will be opaque to its creators but predictable to its initial self?
I am also rather confused about how an AI is believed to be able to hide its attempts to build molecular nanotechnology. It doesn’t seem very inconspicuous to me.
If AI creates some relatively simple free-ranging backup viruses that re-assemble a working AI whenever they can (e.g. forming something like a decentralized p2p network that redundantly stores its data when AI can’t run), even shutting down all instances of AI in the world won’t cure the infection, it’ll come back whenever you restore the Internet or even local networks, letting any previously infected computers in.
If you assume a world/future in possession of vastly more advanced technology than our current world, then I don’t disagree with you. If it takes very long for the first GAI to be created and if it is then created by means of a single breakthrough that somehow combines all previous discoveries and expert systems into a much more powerful single entity, with huge amounts of hard-coded knowledge, a complex utility-function and various dangerous drives, then I agree. It wouldn’t even take the strong version of recursive self-improvement to pretty much take over the world under those assumptions.
If an AI can do that, why would the humans who build it be unable to notice any malicious intentions?
I meant not noticing that it escaped to the Internet. But “noticing malicious intentions” is a rather strange thing to say. You notice behavior, not intentions. It’s stupid to signal your true intentions if you’ll be condemned for them.
Why wouldn’t the humans who created it not be able to use the same algorithms that the AI uses to predict what it will do?
What what will do, predict in what sense to what end? AI in the wild acts depending on what it encounters, all instances are unique (and beware of the watchers).
In other words, could you elaborate on why you believe that what the AI is going to do will be opaque to its creators but predictable to its initial self?
I didn’t talk of this.
If it takes very long for the first GAI to be created and if it is then created by means of a single breakthrough that somehow combines all previous discoveries and expert systems into a much more powerful single entity, with huge amounts of hard-coded knowledge, a complex utility-function and various dangerous drives, then I agree.
I don’t see how those assumptions are relevant. Also, all drives are dangerous, to the extent their combination differs from ours. Utility is not temper or personality or tendency to act in a certain way. Utility is what shapes long-term plans, any of whose elements might have arbitrary appearance, as necessary to dominate the circumstances.
In other words, could you elaborate on why you believe that what the AI is going to do will be opaque to its creators but predictable to its initial self?
I didn’t talk of this.
Maybe I misunderstood you. But I still believe that it is an important question.
To be able to self-improve efficiently an AI has to make some sort of predictions on how modifications will affect its behavior. The desired solution is actually much stronger than that. The AI will have to prove the friendliness of its modified self, respectively its successor, with respect to its utility-function.
The question is, if the AI can make such predictions about the behavior of improved versions of itself, why wouldn’t humans be able to do the same?
The fear is that an AI will do something that eventually leads to the extinction of all human value. But the AI must have the same fear about improved versions of itself. The AI must fear that its successor will cause the demise of what it values. Therefore it has to be able to make sure that this won’t happen. But why wouldn’t humans not be able to do the same?
An AI is not a black box to itself. It won’t be a black box to its creators. Inventing molecular nanotechnology and taking over the world in its spare time seems like something that should be noticeable.
What if the AI makes mistakes? Meaning, it mistakenly believes the successor it has just wrote has the same utility function? The same way a human could mistakenly believe the AI he has just build is friendly? In the same vein, what if the AI cannot accurately assess its own utility function, but go on optimizing anyway?
Such a badly done AI may automatically flatline, and not be able to improve itself. I don’t know. But even if the AI is friendly to itself, we humans could still botch the utility function (even if that utility function is as meta as CEV).
You assume that everyone computer can run a seed AI
Yes I do. But it may not be as probable as I thought.
and that the global infrastructure is very stable under attack.
I said as much. And this one seems more plausible. If we uphold freedom, a sensible politic for the internet is to make it as resilient and uncontrollable as possible. If we don’t, well…
Now if those two assumptions are correct, and we further assume the AI already controls a single computer with an internet connection, then it has plenty of resources to take over a second one. It would need to :
Find a security flaw somewhere (including convincing someone to run arbitrary code), upload itself there, then rinse and repeat.
Or, find and exploit credit card numbers, (or convince someone to give them away), then buy computing power.
Or, find and convince someone (typically a lawyer) to set up a company for it, then make money (legally or not), then buy computing power.
Or, …
Humans do that right now. (Credit card theft, money laundering, various scams, legit offshore companies…)
Of course, if the first computer isn’t connected, the AI would have to get out of the box first. But Eliezer can do that already (and he’s not alone). It’s a long shot, but if several equally capable AIs pop up in different laboratories worldwide, then eventually one of them will be able to convince its way out.
Humans do that right now. (Credit card theft, money laundering, various scams, legit offshore companies…)
But humans are optimized to do all that, to work in a complex world. And humans are not running on a computer being watched by their creators who are eager to write new studies on how your algorithms behave. I just don’t see it being a plausible scenario that all this could happen unnoticed.
Also, simple credit card theft etc. isn’t enough. At some point you’ll have to buy Intel or create your own companies to manufacture your new substrate or build your new particle accelerator.
OK, let this AI be safely contained, and let the researchers publish. Now, what’s stopping some idiot to write a poorly specified goal system, then deliberately let the AI out of the box so it can take over the world? It only takes one idiot among the many that could read the publication.
And of course credit card theft isn’t enough by itself. But it is enough to bootstrap yourself into something more profitable. There are many ways to acquire money, and the AI, by duplicating itself, can access many of them at the same time. If the AI does nothing stupid, its expansion should be both undetectable and exponential. I give it a year to buy Intel or something.
Sure, in the mean time, there will be other AIs with different poorly specified goal systems. Some of them could even be genuinely Friendly. But then we’re screwed anyway, for this will probably end up in something like a Hansonian Nighmare. At this point, the only thing that could stop it would be a genuine Seed AI that can outsmart them all. You have less than a year to develop it, and ensure its Friendliness.
Wait a minute, bomb the AI ? Assuming it took over a fraction of the Internet, that is bound to be very difficult, as you’d have to sever every computer it controls out the network. Yes, currently, the network is still over-centralized, and quite fragile. But if wireless mesh networking live up to its promises (I believe it will, in a decade or two), this won’t be the case any more.
As for the “not optimized for a Global AI” part, you could just have the AI split (or duplicate) itself, and let a gazillion sub-processes take care of the gazillion task is may want to perform. It probably doesn’t need more bandwidth that a human organization does now (except of course to copy itself).
I object to the “assuming” part. It needs to acquire resources to acquire resources. If it can’t take over the Internet with its initial resources then it won’t be able to make use of further resources.
You assume that everyone computer can run a seed AI and that the global infrastructure is very stable under attack.
I object to the “just” part.
It’s likely very easy to take over enough computers (e.g. hundreds), so that (1) they can run an AI capable enough to work on taking over more computers, perhaps slowly and (2) it’s not possible to shut it down without severely disrupting the Internet, since you don’t know which computers are involved (if it’s not possible yet to run an AI on hundreds of computers, that capability is at most a few decades in the future, and at some point one computer might suffice). Provider/company networks may be big enough to run the AI and almost impossible to shut down in coordination all over the world. If AI has sense enough to avoid making any convincingly threatening moves, it won’t be possible to convince people to essentially disrupt the whole economy in an attempt to exterminate it (even better, nobody notices at all). If AI creates some relatively simple free-ranging backup viruses that re-assemble a working AI whenever they can (e.g. forming something like a decentralized p2p network that redundantly stores its data when AI can’t run), even shutting down all instances of AI in the world won’t cure the infection, it’ll come back whenever you restore the Internet or even local networks, letting any (enough of) previously infected computers in. And given enough time, the disease will fester.
I don’t believe this analysis.
People talk about computer security as though it’s an arms race where the smarter side always wins. This is just wrong. Once I’ve written a correct program (for some set of correctness properties), it’ll stay correct. If I have a secure operating system, it’ll still be secure no matter how smart the attacker is. This is somewhat beyond current industrial practice, but we have verified operating systems and compilers as research prototypes. We know how to write secure software today. We might not reliably achieve it, but it seems pretty much settled that it’s achievable without superhuman skill.
Wide area peer-to-peer isn’t a good platform for general computing; you have severe reliability and connectivity problems at the edge of the network. If you give me 100 random network-connected machines, it doesn’t give me 100 times the real computational power. I’m not sure it gives me 10x, for most problems of interest. In particular, my machine-learning colleagues tell me that their learning algorithms don’t parallelize well. Apparently, good learning algorithms need to combine results from examining different subsets of the data, and that’s intrinsically communication-intensive and therefore not efficient in parallel.
You could presumably write software to automatically craft exploits and use them to re-establish itself elsewhere. This would be a highly resource intensive and therefore non-stealthy process. All exploits only work on some subset of the machines out there; therefore, an attacker firing off attacks across the network will be highly visible. We have honeypots, internet telescopes, and suchlike today. I don’t think this process could be kept hidden now, and the defensive technology is steadily improving.
I’m not qualified to assess all possible AI-risk scenarios, but I think “the AI will take over all our computers” is overrated as a risk. That window is closing now, and given current trends I expect it to be closed within 10-15 years. I expect the generation-after-next operating systems to have the security-critical parts (and possibly most of the code) verified.
Security is possible in principle (barring the cases like stupid/careless users manually launching content sent to them or found somewhere and granting it undue privileges), but very unlikely to become sufficiently reliable in practice anytime soon. At present, breaking into more and more computers is a matter of continuously applying some creative effort to the task, researching vulnerabilities and working around existing recognition-type defenses. In any case, earning money to buy additional computing power is similar for our purposes.
Yes. What matters is when several hundred (thousand) haphazardly connected computers is enough for the system to be capable enough to successfully work on its continued survival.
This is mildly plausible to succeed in permanently inhibiting stupid backup after AI is terminated by disrupting the Internet and most big networks. But it takes only one backup system, and there’s incentive to create many, with different restoration strategies.
And when only a few computers are sufficient to run an AI, all this becomes irrelevant, as it necessarily remains active somewhere.
How soon is soon? I would bet on most systems not being vulnerable to remote exploits without user involvement within the next 10 years. I would not bet on dangerous self-improving AI within that timeframe.
Once the rogue-AI-in-the-net is slower at self-improvement than human civilization, it’s not so much of a threat. The world in which there’s a rogue-AI out there is probably also the world in which we have powerful-but-reliable automation for lots of human-controlled software development, too...
This assumption strikes me as far-fetched. There presumably is some minimum quantity of code and data for the thing to be effective. It would be surprising if that subset fit on one machine, since that would imply that an effective self-modifying AI has low resource needs and that you can fit an effective natural-language processor into a memory much smaller than those used by today’s natural-language-processing systems.
By a few computers being sufficient I mean that computers become powerful enough, not that AI gets compressed (feasibility of which is less certain). Other contemporary AI tech won’t be competitive with rogue AI when we can’t solve FAI, because any powerful AI will in that case itself be a rogue AI and won’t be useful for defense (it might appear useful though).
“AI” is becoming a dangerously overloaded term here. There’s AI in the sense of a system that does human-like tasks as well as humans (Specialized artificial intelligence), and there’s AI in the sense of a highly-self-modifying system with long-range planning, AGI. I don’t know what “powerful” means in this context, but it doesn’t seem clear to me that humans + ASI can’t be competitive with an AGI.
And I am skeptical that there will be radical improvements in AGI without corresponding improvements to ASI. it might easily be the case that humans + ASI support for high-productivity software engineering are enough to build secure networked systems, even in the presence of AGI. I would bet on humans + proof systems + higher-level developer tools being able to build secure systems, before AGI becomes good enough to be dangerous.
By “powerful AI” I meant AGI (terminology seems to have drifted there in this thread). Humans+narrow AI might be powerful, but can’t become very powerful without AGI, while AGI in principle could. AGI could work on its own narrow AIs if that potentially helps.
You keep talking about security, but as I mentioned above, earning money works as well or probably better for accumulating power. Security was mostly relevant in the discussion of quickly infecting the world and surviving an (implausibly powerful) extermination attempt, which only requires being able to anonymously infect a few hundred or thousands of computers worldwide, which even with good overall security seems likely to remain possible (perhaps through user involvement alone, for example after the first wave that recruits enough humans).
Hmm.
I’m now imagining a story in which there’s a rogue AI out there with a big bank account (attained perhaps from insider trading), hiring human proxies to buy equipment, build things, and gradually accumulate power and influence, before, some day, deciding to turn the world abruptly into paperclips.
It’s an interesting science fiction story. I still don’t quite buy it as a high-probability scenario or one to lie awake worrying about. An AGI able to do this without making any mistakes is awfully far from where we are today. An AGI able to write an AGI able to do this, seems if anything to be a harder problem.
We know that the real world is a chaotic messy place and that most interesting problems are intractable. Any useful AGI or ASI is going to be heavily heuristic. There won’t be any correctness proofs or reliably shortcuts.Verifying that a proposed modification is an improvement is going to have to be based on testing, not just cleverness. I don’t believe you can construct a small sandbox and train an AGI in that sandbox, and then have it work well in the wider world. I think training and tuning an AGI means lots of involvement with actual humans, and that’s going to be a human-scale process.
If I did worry about the science fiction scenario above, I would look for ways to thwart it that also have high payoff if AGI doesn’t happen soon or isn’t particularly effective at first. I would think about ways to do high-assurance financial transparency and auditing. Likewise technical auditing and software security.
But it is not easy to use the money. You can’t “just” build huge companies with fake identities, or a straw man, to create revolutionary technologies easily. Running companies with real people takes a lot of real-world knowledge, interactions and feedback. But most importantly, it takes a lot of time. I just don’t see that an AI could create a new Intel or Apple over a few years without its creators noticing anything.
The goals of an AI will be under scrutiny at any time. It seems very implausible that scientists, a company or the military are going to create an AI and then just let it run without bothering about its plans. An artificial agent is not a black box, like humans are, where one is only able to guess its real intentions. A plan for world domination seems like something that can’t be concealed from its creators. Lying is no option if your algorithms are open to inspection.
Could you elaborate on “even better, nobody notices at all”. Any AI capable of efficient self-modification must be able to grasp its own workings and make predictions about improvements to various algorithms and its overall decision procedure. If an AI can do that, why would the humans who build it be unable to notice any malicious intentions? Why wouldn’t the humans who created it not be able to use the same algorithms that the AI uses to predict what it will do? If humans are unable to predict what the AI will do, how is the AI able to predict what improved versions of itself will do?
In other words, could you elaborate on why you believe that what the AI is going to do will be opaque to its creators but predictable to its initial self?
I am also rather confused about how an AI is believed to be able to hide its attempts to build molecular nanotechnology. It doesn’t seem very inconspicuous to me.
If you assume a world/future in possession of vastly more advanced technology than our current world, then I don’t disagree with you. If it takes very long for the first GAI to be created and if it is then created by means of a single breakthrough that somehow combines all previous discoveries and expert systems into a much more powerful single entity, with huge amounts of hard-coded knowledge, a complex utility-function and various dangerous drives, then I agree. It wouldn’t even take the strong version of recursive self-improvement to pretty much take over the world under those assumptions.
I meant not noticing that it escaped to the Internet. But “noticing malicious intentions” is a rather strange thing to say. You notice behavior, not intentions. It’s stupid to signal your true intentions if you’ll be condemned for them.
What what will do, predict in what sense to what end? AI in the wild acts depending on what it encounters, all instances are unique (and beware of the watchers).
I didn’t talk of this.
I don’t see how those assumptions are relevant. Also, all drives are dangerous, to the extent their combination differs from ours. Utility is not temper or personality or tendency to act in a certain way. Utility is what shapes long-term plans, any of whose elements might have arbitrary appearance, as necessary to dominate the circumstances.
Maybe I misunderstood you. But I still believe that it is an important question.
To be able to self-improve efficiently an AI has to make some sort of predictions on how modifications will affect its behavior. The desired solution is actually much stronger than that. The AI will have to prove the friendliness of its modified self, respectively its successor, with respect to its utility-function.
The question is, if the AI can make such predictions about the behavior of improved versions of itself, why wouldn’t humans be able to do the same?
The fear is that an AI will do something that eventually leads to the extinction of all human value. But the AI must have the same fear about improved versions of itself. The AI must fear that its successor will cause the demise of what it values. Therefore it has to be able to make sure that this won’t happen. But why wouldn’t humans not be able to do the same?
An AI is not a black box to itself. It won’t be a black box to its creators. Inventing molecular nanotechnology and taking over the world in its spare time seems like something that should be noticeable.
What if the AI makes mistakes? Meaning, it mistakenly believes the successor it has just wrote has the same utility function? The same way a human could mistakenly believe the AI he has just build is friendly? In the same vein, what if the AI cannot accurately assess its own utility function, but go on optimizing anyway?
Such a badly done AI may automatically flatline, and not be able to improve itself. I don’t know. But even if the AI is friendly to itself, we humans could still botch the utility function (even if that utility function is as meta as CEV).
Yes I do. But it may not be as probable as I thought.
I said as much. And this one seems more plausible. If we uphold freedom, a sensible politic for the internet is to make it as resilient and uncontrollable as possible. If we don’t, well…
Now if those two assumptions are correct, and we further assume the AI already controls a single computer with an internet connection, then it has plenty of resources to take over a second one. It would need to :
Find a security flaw somewhere (including convincing someone to run arbitrary code), upload itself there, then rinse and repeat.
Or, find and exploit credit card numbers, (or convince someone to give them away), then buy computing power.
Or, find and convince someone (typically a lawyer) to set up a company for it, then make money (legally or not), then buy computing power.
Or, …
Humans do that right now. (Credit card theft, money laundering, various scams, legit offshore companies…)
Of course, if the first computer isn’t connected, the AI would have to get out of the box first. But Eliezer can do that already (and he’s not alone). It’s a long shot, but if several equally capable AIs pop up in different laboratories worldwide, then eventually one of them will be able to convince its way out.
But humans are optimized to do all that, to work in a complex world. And humans are not running on a computer being watched by their creators who are eager to write new studies on how your algorithms behave. I just don’t see it being a plausible scenario that all this could happen unnoticed.
Also, simple credit card theft etc. isn’t enough. At some point you’ll have to buy Intel or create your own companies to manufacture your new substrate or build your new particle accelerator.
OK, let this AI be safely contained, and let the researchers publish. Now, what’s stopping some idiot to write a poorly specified goal system, then deliberately let the AI out of the box so it can take over the world? It only takes one idiot among the many that could read the publication.
And of course credit card theft isn’t enough by itself. But it is enough to bootstrap yourself into something more profitable. There are many ways to acquire money, and the AI, by duplicating itself, can access many of them at the same time. If the AI does nothing stupid, its expansion should be both undetectable and exponential. I give it a year to buy Intel or something.
Sure, in the mean time, there will be other AIs with different poorly specified goal systems. Some of them could even be genuinely Friendly. But then we’re screwed anyway, for this will probably end up in something like a Hansonian Nighmare. At this point, the only thing that could stop it would be a genuine Seed AI that can outsmart them all. You have less than a year to develop it, and ensure its Friendliness.
Humans are not especially optimized to work in the environment loup-vaillant describes.