This blog comment describes what seems to me the obvious default scenario for an unFriendly AI takeoff. I’d be interested to see more discussion of it.
The problem with the specific scenario given, with experimental modification/duplication, rather than careful proof based modification, is that is liable to have the same problem that we have with creating systems this way. The copies might not do what the agent that created them want.
Which could lead to a splintering of the AI, and in-fighting over computational resources.
It also makes the standard assumptions that AI will be implemented on and stable on the von Neumann style computing architecture.
Would you agree that one possible route to uFAI is human inspired?
Human inspired systems might have the same or similar high fallibility rate (from emulating neurons, or just random experimentation at some level) as humans and giving it access to its own machine code and low-level memory would not be a good idea. Most changes are likely to be bad.
So if an AI did manage to port its code, it would have to find some way of preventing/discouraging the copied AI in the x86 based arch from playing with the ultimate mind expanding/destroying drug that is machine code modification. This is what I meant about stability.
Let me point out that we (humanity) does actually have some experience with this scenario. Right now, mobile code that spreads across a network without effective controls on the bounds of its expansion by the author is worms. If we have experience, we should mine it for concrete predictions and countermeasures.
General techniques against worms might include: isolated networks, host diversity, rate-limiting, and traffic anomaly detection.
Are these low-cost/high-return existential reduction techniques?
No, these are high-cost/low-return existential risk reduction techniques. Major corporations and governments already have very high incentive to protect their networks, but despite spending billions of dollars, they’re still being frequently penetrated by human attackers, who are not even necessarily professionals. Not to mention the hundreds of millions of computers on the Internet that are unprotected because their owners have no idea how to do so, or they don’t contain information that their owners consider especially valuable.
I got into cryptography partly because I thought it would help reduce the risk of a bad Singularity. But while cryptography turned out to work relatively well (against humans anyway), the rest of the field of computer security is in terrible shape, and I see little hope that the situation would improve substantially in the next few decades.
That’s outside my specialization of cryptography, so I don’t have too much to say about it. I do remember reading about the object-capability model and the E language years ago, and thinking that it sounded like a good idea, but I don’t know why it hasn’t been widely adopted yet. I don’t know if it’s just inertia, or whether there are some downsides that its proponents tend not to publicize.
In any case, it seems unlikely that any security solution can improve the situation enough to substantially reduce the risk of a bad Singularity at this point, without a huge cost. If the cause of existential-risk reduction had sufficient resources, one project ought to be to determine the actual costs and benefits of approaches like this and whether it would be feasible to implement (i.e., convince society to pay whatever costs are necessary to make our networks more secure), but given the current reality I think the priority of this is pretty low.
Thanks. I just wanted to know if this was the sort of thing you had in mind, and whether you knew any technical reasons why it wouldn’t do what you want.
This is one thing I keep a close-ish eye on. One of the major proponents of this sort of security has recently gone to work for Microsoft on their research operating systems. So it might come a long in a bit.
As to why it hasn’t caught on, it is partially inertia and partially it requires more user interaction/understanding of the systems than ambient authority. Good UI and metaphors can decrease that cost though.
The ideal would be to have a self-maintaining computer system with this sort of security system. However a good self-maintaining system might be dangerously close to a self-modifying AI.
There’s also a group of proponents of this style working on Caja at Google, including Mark Miller, the designer of E. And somepeople at HP.
Actually, all these people talk to one another regularly. They don’t have a unified plan or a single goal, but they collaborate with one another frequently. I’ve left out several other people who are also trying to find ways to push in the same direction. Just enough names and references to give a hint. There are several mailing lists where these issues are discussed. If you’re interested, this is probably the one to start with.
I meant it more as an indication that Microsoft are working in the direction of better secured OSes already, rather than his being a pivotal move. Coyotos might get revived when the open source world sees what MS produces and need to play catch up.
That assumes MS ever goes far enough that the FLOSS world feels any gap that could be caught up.
MS rarely does so; the chief fruit of 2 decades of Microsoft Research sponsorship of major functional language researchers like Simon Marlow or Simon Peyton-Jones seems to be… C# and F#. The former is your generic quasi-OO imperative language like Python or Java, with a few FPL features sprinkled in, and the latter is a warmed-over O’Caml: it can’t even make MLers feel like they need to catch up, much less Haskellers or FLOSS users in general.
The FPL OSS community is orders of magnitude more vibrant than the OSS secure operating system research. I don’t know of any living projects that use the object-capability model at the OS level (plenty of language level and higher level stuff going on).
General techniques against worms might include: isolated networks, host diversity, rate-limiting, and traffic anomaly detection.
Are these low-cost/high-return existential reduction techniques?
I can’t imagine having any return in protection against spreading of AI on the Internet at any cost (even in a perfect world, AI can still produce value, e.g. earn money online, and so buy access to more computing resources).
Your statement sounds a bit overgeneralized—but you probably have a point.
Still, would you indulge me in some idle speculation? Maybe there could be a species of aliens that evolved to intelligence by developing special microbe-infested organs (which would be firewalled somehow from the rest of the alien themselves) and incentivizing the microbial colonies somehow to solve problems for the host.
Maybe we humans evolved to intelligence that way—after all, we do have a lot of bacteria in our guts. But then, all the evidence that we have pointing to brains as information-processing center would have to be wrong. Maybe brains are the firewall organ! Memes are sortof like microbes, and they’re pretty well “firewalled” (genetic engineering is a meme-complex that might break out of the jail).
The notion of creating an ecology of entities, and incentivizing them to produce things that we value, might be a reasonable strategy, one that we humans have been using for some time.
I can’t see how this comment relates to the previous one. It seems to start an entirely new conversation. Also, the metaphor with brains and microbes doesn’t add understanding for me, I can only address the last paragraph, on its own.
The notion of creating an ecology of entities, and incentivizing them to produce things that we value, might be a reasonable strategy, one that we humans have been using for some time.
The crucial property of AIs making them a danger is (eventual) autonomy, not even rapid coming to power. Once the AI, or a society (“ecology”) of AIs, becomes sufficiently powerful to ignore vanilla humans, its values can’t be significantly influenced, and most of the future is going to be determined by those values. If those values are not good, from human values point of view, the future is lost to us, it has no goodness. The trick is to make sure that the values of such an autonomous entity are a very good match with our own, at some point where we still have a say in what they are.
Talk of “ecologies” of different agents creates an illusion of continuous control. The standard intuitive picture has little humans at the lower end with a network of gradually more powerful and/or different agents stretching out from them. But how much is really controlled by that node? Its power has no way of “amplifying” as you go through the network: if only humans and a few other agents share human values, these values will receive very little payoff. This is also not sustainable: over time, one should expect preference of agents with more power to gain in influence (which is what “more power” means).
The best way to win this race is to not create different-valued competitors that you don’t expect being able to turn into your own almost-copies, which seems infeasible for all the scenarios I know of. FAI is exactly about devising such a copycat, and if you can show how to do that with “ecologies”, all power to you, but I don’t expect anything from this line of thought.
To explain the relation, you said: “I can’t imagine having any return [...from this idea...] even in a perfect world, AI can still produce value, e.g. earn money online.”
I was trying to suggest that in fact there might be a path to Friendliness by installing sufficient safeguards that the primary way a software entity could replicate or spread would be by providing value to humans.
In the comment above, I explained why what AI does is irrelevant, as long as it’s not guaranteed to actually have the right values: once it goes unchecked, it just reverts to whatever it actually prefers, be it in a flurry of hard takeoff or after a thousand years of close collaboration. “Safeguards”, in every context I saw, refer to things that don’t enforce values, only behavior, and that’s not enough. Even the ideas for enforcement of behavior look infeasible, but the more important point is that even if we win this one, we still lose eventually with such an approach.
My symbiotic-ecology-of-software-tools scenario was not a serious proposal as the best strategy to Friendliness. I was trying to increase the plausibility of SOME return at SOME cost, even given that AIs could produce value.
I’m afraid I see the issue as clear-cut, you can’t get “some” return, you can only win or lose (probability of getting there is of course more amenable to small nudges).
Making such a statement significantly increases the standard of reasoning I expect from a post. That is, I expect you to be either right or at least a step ahead of the one with whom you are communicating.
This blog comment describes what seems to me the obvious default scenario for an unFriendly AI takeoff. I’d be interested to see more discussion of it.
The problem with the specific scenario given, with experimental modification/duplication, rather than careful proof based modification, is that is liable to have the same problem that we have with creating systems this way. The copies might not do what the agent that created them want.
Which could lead to a splintering of the AI, and in-fighting over computational resources.
It also makes the standard assumptions that AI will be implemented on and stable on the von Neumann style computing architecture.
Of course, if it’s not, it could port itself to such if doing so is advantageous.
Would you agree that one possible route to uFAI is human inspired?
Human inspired systems might have the same or similar high fallibility rate (from emulating neurons, or just random experimentation at some level) as humans and giving it access to its own machine code and low-level memory would not be a good idea. Most changes are likely to be bad.
So if an AI did manage to port its code, it would have to find some way of preventing/discouraging the copied AI in the x86 based arch from playing with the ultimate mind expanding/destroying drug that is machine code modification. This is what I meant about stability.
Er, I can’t really give a better rebuttal than this: http://www.singinst.org/upload/LOGI//levels/code.html
What point are you rebutting?
The idea that a greater portion of possible changes to a human-style mind are bad than changes of a equal magnitude to a Von Neumann-style mind.
Most random changes to a von Neumann-style mind would be bad as well.
Just a von-Neumann-style mind is unlikely to make the random mistakes that we can do, or at least that is Eliezer’s contention.
I can’t wait until there are uploads around to make questions like this empirical.
Let me point out that we (humanity) does actually have some experience with this scenario. Right now, mobile code that spreads across a network without effective controls on the bounds of its expansion by the author is worms. If we have experience, we should mine it for concrete predictions and countermeasures.
General techniques against worms might include: isolated networks, host diversity, rate-limiting, and traffic anomaly detection.
Are these low-cost/high-return existential reduction techniques?
No, these are high-cost/low-return existential risk reduction techniques. Major corporations and governments already have very high incentive to protect their networks, but despite spending billions of dollars, they’re still being frequently penetrated by human attackers, who are not even necessarily professionals. Not to mention the hundreds of millions of computers on the Internet that are unprotected because their owners have no idea how to do so, or they don’t contain information that their owners consider especially valuable.
I got into cryptography partly because I thought it would help reduce the risk of a bad Singularity. But while cryptography turned out to work relatively well (against humans anyway), the rest of the field of computer security is in terrible shape, and I see little hope that the situation would improve substantially in the next few decades.
What do you think of the object-capability model? And removing ambient authority in general.
That’s outside my specialization of cryptography, so I don’t have too much to say about it. I do remember reading about the object-capability model and the E language years ago, and thinking that it sounded like a good idea, but I don’t know why it hasn’t been widely adopted yet. I don’t know if it’s just inertia, or whether there are some downsides that its proponents tend not to publicize.
In any case, it seems unlikely that any security solution can improve the situation enough to substantially reduce the risk of a bad Singularity at this point, without a huge cost. If the cause of existential-risk reduction had sufficient resources, one project ought to be to determine the actual costs and benefits of approaches like this and whether it would be feasible to implement (i.e., convince society to pay whatever costs are necessary to make our networks more secure), but given the current reality I think the priority of this is pretty low.
Thanks. I just wanted to know if this was the sort of thing you had in mind, and whether you knew any technical reasons why it wouldn’t do what you want.
This is one thing I keep a close-ish eye on. One of the major proponents of this sort of security has recently gone to work for Microsoft on their research operating systems. So it might come a long in a bit.
As to why it hasn’t caught on, it is partially inertia and partially it requires more user interaction/understanding of the systems than ambient authority. Good UI and metaphors can decrease that cost though.
The ideal would be to have a self-maintaining computer system with this sort of security system. However a good self-maintaining system might be dangerously close to a self-modifying AI.
There’s also a group of proponents of this style working on Caja at Google, including Mark Miller, the designer of E. And some people at HP.
Actually, all these people talk to one another regularly. They don’t have a unified plan or a single goal, but they collaborate with one another frequently. I’ve left out several other people who are also trying to find ways to push in the same direction. Just enough names and references to give a hint. There are several mailing lists where these issues are discussed. If you’re interested, this is probably the one to start with.
Sadly, I suspect this moves things backwards rather than forwards. I was really hoping that we’d see Coyotos one day, which now seems very unlikely.
I meant it more as an indication that Microsoft are working in the direction of better secured OSes already, rather than his being a pivotal move. Coyotos might get revived when the open source world sees what MS produces and need to play catch up.
That assumes MS ever goes far enough that the FLOSS world feels any gap that could be caught up.
MS rarely does so; the chief fruit of 2 decades of Microsoft Research sponsorship of major functional language researchers like Simon Marlow or Simon Peyton-Jones seems to be… C# and F#. The former is your generic quasi-OO imperative language like Python or Java, with a few FPL features sprinkled in, and the latter is a warmed-over O’Caml: it can’t even make MLers feel like they need to catch up, much less Haskellers or FLOSS users in general.
The FPL OSS community is orders of magnitude more vibrant than the OSS secure operating system research. I don’t know of any living projects that use the object-capability model at the OS level (plenty of language level and higher level stuff going on).
For some of the background, Rob Pike wrote an old paper on the state of system level research.
I can’t imagine having any return in protection against spreading of AI on the Internet at any cost (even in a perfect world, AI can still produce value, e.g. earn money online, and so buy access to more computing resources).
Your statement sounds a bit overgeneralized—but you probably have a point.
Still, would you indulge me in some idle speculation? Maybe there could be a species of aliens that evolved to intelligence by developing special microbe-infested organs (which would be firewalled somehow from the rest of the alien themselves) and incentivizing the microbial colonies somehow to solve problems for the host.
Maybe we humans evolved to intelligence that way—after all, we do have a lot of bacteria in our guts. But then, all the evidence that we have pointing to brains as information-processing center would have to be wrong. Maybe brains are the firewall organ! Memes are sortof like microbes, and they’re pretty well “firewalled” (genetic engineering is a meme-complex that might break out of the jail).
The notion of creating an ecology of entities, and incentivizing them to produce things that we value, might be a reasonable strategy, one that we humans have been using for some time.
I can’t see how this comment relates to the previous one. It seems to start an entirely new conversation. Also, the metaphor with brains and microbes doesn’t add understanding for me, I can only address the last paragraph, on its own.
The crucial property of AIs making them a danger is (eventual) autonomy, not even rapid coming to power. Once the AI, or a society (“ecology”) of AIs, becomes sufficiently powerful to ignore vanilla humans, its values can’t be significantly influenced, and most of the future is going to be determined by those values. If those values are not good, from human values point of view, the future is lost to us, it has no goodness. The trick is to make sure that the values of such an autonomous entity are a very good match with our own, at some point where we still have a say in what they are.
Talk of “ecologies” of different agents creates an illusion of continuous control. The standard intuitive picture has little humans at the lower end with a network of gradually more powerful and/or different agents stretching out from them. But how much is really controlled by that node? Its power has no way of “amplifying” as you go through the network: if only humans and a few other agents share human values, these values will receive very little payoff. This is also not sustainable: over time, one should expect preference of agents with more power to gain in influence (which is what “more power” means).
The best way to win this race is to not create different-valued competitors that you don’t expect being able to turn into your own almost-copies, which seems infeasible for all the scenarios I know of. FAI is exactly about devising such a copycat, and if you can show how to do that with “ecologies”, all power to you, but I don’t expect anything from this line of thought.
To explain the relation, you said: “I can’t imagine having any return [...from this idea...] even in a perfect world, AI can still produce value, e.g. earn money online.”
I was trying to suggest that in fact there might be a path to Friendliness by installing sufficient safeguards that the primary way a software entity could replicate or spread would be by providing value to humans.
In the comment above, I explained why what AI does is irrelevant, as long as it’s not guaranteed to actually have the right values: once it goes unchecked, it just reverts to whatever it actually prefers, be it in a flurry of hard takeoff or after a thousand years of close collaboration. “Safeguards”, in every context I saw, refer to things that don’t enforce values, only behavior, and that’s not enough. Even the ideas for enforcement of behavior look infeasible, but the more important point is that even if we win this one, we still lose eventually with such an approach.
My symbiotic-ecology-of-software-tools scenario was not a serious proposal as the best strategy to Friendliness. I was trying to increase the plausibility of SOME return at SOME cost, even given that AIs could produce value.
I seem to have stepped onto a cached thought.
I’m afraid I see the issue as clear-cut, you can’t get “some” return, you can only win or lose (probability of getting there is of course more amenable to small nudges).
Making such a statement significantly increases the standard of reasoning I expect from a post. That is, I expect you to be either right or at least a step ahead of the one with whom you are communicating.