We have explosive change today—if “explosive” intended to mean the type of exponential growth process exhibited in nuclear bombs. Check with Moore’s law.
If you are looking for an explosion, there is no need for a crystal ball—simply look around you.
I agree with you—and I think the SIAI focuses too much on possible future computer programs, and neglects the (limited) superintelligences that already exist, various amalgams and cyborgs and group minds coordinated with sonic telepathy.
In the future where the world continues (that is, without being paperclipped) and without a singleton, we need to think about how to deal with superintelligences. By “deal with” I’m including prevailing over superintelligences, without throwing up hands and saying “it’s smarter than me”.
I’m pretty good at beating my computer at chess, even though I’m an awful player. I challenge it, and it runs out of time—apparently it can’t tell that it’s in a competition, or can’t press the button on the clock.
This might sound like a facetious answer, but I’m serious. One way to defeat something that is stronger than you in a limited domain is to strive to shift the domain to one where you are strong. Operating with objects designed for humans (like physical chess boards and chess clocks) is a domain that current computers are very weak at.
There are other techniques too. Consider disease-fighting. The microbes that we fight are vastly more experienced (in number of generations evolved), and the number of different strategies that they try is vastly huge. How is it that we manage to (sometimes) defeat specific diseases? We strive to hamper the enemy’s communication and learning capabilities with quarantine techniques, and steal or copy the nanotechnology (antibiotics) necessary to defeat it. These strategies might well be our best techniques against unFriendly manmade nanotechnological infections, if such broke out tomorrow.
Bruce Schneier beats people over the head with the notion DON’T DEFEND AGAINST MOVIE PLOTS! The “AI takes over the world” plot is influencing a lot of people’s thinking. Unfriendly AGI, despite its potential power, may well have huge blind spots; mind design space is big!
Bruce Schneier beats people over the head with the notion DON’T DEFEND AGAINST MOVIE PLOTS! The “AI takes over the world” plot is influencing a lot of people’s thinking. Unfriendly AGI, despite its potential power, may well have huge blind spots; mind design space is big!
I have not yet watched a movie where humans are casually obliterated by a superior force, be that a GAI or a technologically superior alien species. At least some of the humans always seem to have a fighting chance. The odds are overwhelming of course, but the enemy always has a blind spot that can be exploited. You list some of them here. They are just the kind of thing McKay deploys successfully against advanced nanotechnology. Different shows naturally give the AI different exploitable weaknesses. For the sake of the story such AIs are almost always completely blind to the most of the obvious weaknesses of humanity.
The whole ‘overcome a superior enemy by playing to your strengths and exploiting their weakness’ makes for great viewing but outside of the movies it is far less likely to play a part. The chance of creating an uFAI that is powerful enough to be a threat and launch some kind of attack and yet not be able to wipe out humans is negligible outside of fiction. Chimpanzees do not prevail over a civilisation with nuclear weapons. And no, the fact that they can beat us in unarmed close combat does not matter. They just die.
Yes, this is movie-plot-ish-thinking in the sense that I’m proposing that superintelligences can be both dangerous and defeatable/controllable/mitigatable. I’m as prone to falling into the standard human fallacies as the next person.
However, the notion that “avoid strength, attack weakness” is primarily a movie-plot-ism seems dubious to me.
Here is a more concrete prophesy, that I hope will help us communicate better:
Humans will perform software experiments trying to harness badly-understood technologies (ecosystems of self-modifying software agents, say). There will be some (epsilon) danger of paperclipping in this process. Humans will take precautions (lots of people have ideas for precautions that we could take). It is rational for them to take precautions, AND the precautions do not completely eliminate the chance of paperclipping, AND it is rational for them to forge ahead with the experiments despite the danger. During these experiments, people will gradually learn how the badly-understood technologies work, and transform them into much safer (and often much more effective) technologies.
However, the notion that “avoid strength, attack weakness” is primarily a movie-plot-ism seems dubious to me.
That certainly would be dubious. Avoid strength, attack weakness is right behind ‘be a whole heap stronger’ as far as obvious universal strategies go.
Humans will perform software experiments trying to harness badly-understood technologies (ecosystems of self-modifying software agents, say). There will be some (epsilon) danger of paperclipping in this process. Humans will take precautions (lots of people have ideas for precautions that we could take). It is rational for them to take precautions, AND the precautions do not completely eliminate the chance of paperclipping, AND it is rational for them to forge ahead with the experiments despite the danger. During these experiments, people will gradually learn how the badly-understood technologies work, and transform them into much safer (and often much more effective) technologies.
If there are ways to make it possible to experiment and make small mistakes and minimise the risk of catastrophe then I am all in favour of using them. Working out which experiments are good ones to do so that people can learn from them and which ones will make everything dead is a non-trivial task that I’m quite glad to leave to someone else. Given that I suspect both caution and courage to lead to an unfortunately high probability of extinction I don’t envy them the responsibility.
AND it is rational for them to forge ahead with the experiments despite the danger.
Possibly. You can’t make that conclusion without knowing the epsilon in question and the alternatives to such experimentation. But there are times when it is rational to go ahead despite the danger.
The fate of most species is extinction. As the first intelligent agents, people can’t seriously expect our species to last for very long. Now that we have unleashed user-modifiable genetic materials on the planet, DNA’s days are surely numbered. Surely that’s a good thing. Today’s primitive and backwards biotechnology is a useless tangle of unmaintainable spaghetti code that leaves a trail of slime wherever it goes—who would want to preserve that?
I’m pretty good at beating my computer at chess, even though I’m an awful player. I challenge it, and it runs out of time—apparently it can’t tell that it’s in a competition, or can’t press the button on the clock.
You know this trick too? You wouldn’t believe how many quadriplegics I’ve beaten at chess this way.
Unfriendly AGI, despite its potential power, may well have huge blind spots; mind design space is big!
You may be right, but I don’t think it’s a very fruitful idea: what exactly do you propose doing? Also, building of a FAI is a distinct effort from e.g. healing malaria or fighting specific killer robots (with the latter being quite hypothetical, while at least the question of technically understanding FAI seems inevitable).
This may be possible if an AGI has a combination of two features: it has significant real-world capabilities that make it dangerous, yet it’s insane or incapable enough to not be able to do AGI design. I don’t think it’s very plausible, since (1) even Nature was able to build us, given enough resources, and it has no mind at all, so it shouldn’t be fundamentally difficult to build an AGI (even for an irrational proto-AGI) and (2) we are at the lower threshold of being dangerous to ourselves, yet it seems we are at the brink of building an AGI already. Having an AGI dangerous (extinction risk dangerous), and dangerous exactly because of its intelligence, yet not AGI-building-capable doesn’t seem to me unlikely. But maybe possible for some time.
Now, consider the argument about humans being at the lowest possible cognitive capability to do much of anything, applied to proto-AGI-designed AGIs. AGI-designed AGIs are unlikely to be exactly as dangerous as the designer AGI, they are more likely to be significantly more or less dangerous, with “less dangerous” not being an interesting category, if both kinds of designs occur over time. This expected danger adds to the danger of the original AGI, however inapt they themselves may be. And at some point, you get to an FAI-theory-capable-AGI that builds something rational, not once failing all the way to the end of times.
I’d like to continue this conversation, but we’re both going to have to be more verbose. Both of us are speaking in very compressed allusive (that is, allusion-heavy) style, and the potential for miscommunication is high.
“I don’t think it’s a very fruitful idea: what exactly do you propose doing?”
My notion is that SIAI in general and EY in particular, typically work with a specific “default future”—a world where, due to Moore’s law and the advance of technology generally, the difficulty of building a “general-purpose” intelligent computer program drops lower and lower, until one is accidentally or misguidedly created, and the world is destroyed in a span of weeks. I understand that the default future here is intended to be a conservative worst-case possibility, and not a most-probable scenario.
However, this scenario ignores the number and power of entities (such as corporations, human-computer teams, and special-purpose computer programs) and which are more intelligent in specific domains than humans. It ignores their danger—human potential flourishing can be harmed by other things than pure software—and it ignores their potential as tools against unFriendly superintelligence.
Correcting that default future to something more realistic seems fruitful enough to me.
“Technically understanding FAI seems inevitable.”
What? I don’t understand this claim at all. Friendly artificial intelligence, as a theory, need not necessarily be developed before the world is destroyed or significantly harmed.
“This may be possible” What is the referent of “this”? Techniques for combating, constraining, controlling, or manipulating unFriendly superintelligence? We already have these techniques. We harness all kinds of things which are not inherently Friendly and turn them to our purposes (rivers, nations, bacterial colonies). Techniques of building Friendly entities will grow directly out of our existing techniques of taming and harnessing the world, including but not limited to our techniques of proving computer programs correct.
I am not sure I understand your argument in detail, but from what I can tell, your argument is focused “internal” to the aforementioned default future. My focus is on the fact that many very smart AI researchers are dubious about this default future, and on trying to update and incorporate that information.
However, this scenario ignores the number and power of entities (such as corporations, human-computer teams, and special-purpose computer programs) and which are more intelligent in specific domains than humans.
Good point. Even an Einstein level AI with 100 times the computing power of an average human brain probably wouldn’t be able to be beat Deep Blue at chess (at least not easily).
You may be right, but I don’t think it’s a very fruitful idea: what exactly do you propose doing?
Sending Summer Glau back in time, obviously. If you find an unfriendly AGI that can’t figure out how to build nanotech or teraform the planet we’re saved.
A superintelligence can reasonably be expected to proactively track down its “blind spots” and eradicate them—unless it’s “blind spots” are very carefully engineered.
As I understand your argument, you start with an artificial mind, a potential paperclipping danger, and then (for some reason? why does it do this? Remember, it doesn’t have evolved motives) it goes through a blind-spot-eradication program. Afterward, all the blind spots remaining would be self-shadowing blind spots. This far, I agree with you.
The question of how many remaining blind spots, or how big they are has something to do with the space of possible minds and the dynamics of self-modification. I don’t think we know enough about this space/dynamics to conclude that remaining blind spots would have to be carefully engineered.
(for some reason? why does it do this? Remember, it doesn’t have evolved motives) it goes through a blind-spot-eradication program.
You have granted a GAI paperclip maximiser. It wants to make paperclips. That’s all the motive it needs. Areas of competitive weakness are things that may make it get destroyed by humans. If it is destroyed by humans less paperclips will be made. It will eliminate its weaknesses with high priority. It will quite possibly eliminate all the plausible vulnerabilities and also the entire human species before it makes a single paperclip. That’s just good paperclip maximising sense.
As I understand your thought process (and Steve Omohundro’s), you start by saying “it wants to make paperclips”, and then, in order to predict its actions, you recursively ask yourself “what would I do in order to make paperclips?”.
However, this recursion will inject a huge dose of human-mind-ish-ness. It is not at all clear to me that “has goals” or “has desires” is a common or natural feature of mind space. When we study powerful optimization processes—notably, evolution, but also annealing and very large human organizations—we generally can model some aspects of their behavior as goals or desires, but always with huge caveats. The overall impression that we get of these processes, considered as minds, is that they’re insane.
Insane is not the same as stupid, and it’s not the same as safe.
As I understand your thought process (and Steve Omohundro’s), you start by saying “it wants to make paperclips”… It is not at all clear to me that “has goals” or “has desires” is a common or natural feature of mind space.
No, goals are not universal, but it seems likely that the vN-M axioms have a pretty big basin of attraction in mind-space, that a lot of minds will become convinced that sanity is following them, causing them to pick up a utility function, which will probably not capture everything we value and could easily be as simple or as irrelevant to what we value as counting paperclips or smiles.
I think you’re still injecting human-mind-ish-ness. Let me try to stretch your conception of “mind”.
The ocean “wants” to increase efficiency of heat transfer from the equator to the poles. It applies a process akin to simulated annealing with titanic processing power. Has it considered the von Neumann-Morganstern axioms? Is it sane? Is it safe? Is it harnessable?
A colony of microorganisms “wants” to survive and reproduce. In an environment with finite resources (like a wine barrel) is it likely to kill itself off? Is that sane? Are colonies of microorganisms safe? Are they harnessable?
A computer program that grows out of control could be more like the ocean optimizing heat transfer, or a colony of microorganisms “trying” to survive and reproduce. The von Neumann-Morganstern axioms are intensely connected to human notions of math, philosophy and happiness. I think predicting that they’re attractors in mind-space is exactly as implausible as predicting that the Golden Rule is an attractor in mind-space.
A computer program that grows out of control could be more like the ocean optimizing heat transfer, or a colony of microorganisms “trying” to survive and reproduce.
It could. But it wouldn’t be an AGI. They could still become ‘grey goo’ though, which is a different existential threat and yes, it is one where your ‘find their weakness’ thing is right on the mark. Are we even talking about the same topic here?
The topic as I understand it is how the “default future” espoused by SIAI and EY focuses too much on things that look something like HAL or Prime Intellect (and their risks and benefits), and not enough on entities that display super-human capacities in only some arenas (and their risks and benefits).
In particular, an entity that is powerful in some ways and weak in other ways could reduce existential risks without becoming an existential risk.
Machine intelligence will likely have an extended genesis at the hands of humanity—and during its symbiosis with us, there will be a lot of time for us to imprint our values on it.
Indeed, some would say this process has already started. Governments are likely to become superintelligent agents in the future—and they already have detailed and elaborate codifications of the things that many humans value negatively—in the form of their legal systems.
Others have written on this as well—e.g. Robert Wright, Richard Dawkins, John Stewart,
Evolution is rather short-sighted—and only has the lookahead capabilities that organisms have (though these appear to be improving with time). So: whether the target can be described as being a “goal” is debatable.
However, we weren’t talking about evolution, we were talking about superintelligences. Those are likely to be highly goal-directed.
All the things you mentioned seemed pretty goal-directed to me. Evolution has only been relatively short on goals because it has been so primitive up until now. It is easy to see systematic ways in which agents we build will not be like evolution.
It is true that not all aspects of these things are goal-directed. Some aspects of behaviour are meaningless and random—for example.
WTF? Where is this paperclip maximizer hatred coming from? I can 100% guarantee you that a paperclip maximizer would NOT want to exterminate humanity. Not when you have factories. Not when you have the know-how to run these factories. Not when Ricardo’s Law of Comparative Advantage exists.
Think about it for a minute, folks. Let’s say humanity discovered another intelligent carbon-based lifeform in the galaxy of comparable technological advancement. Would you think, “hey, let’s kill these guys and turn their bodies into diamond because its worth more than the constituent elements of the aliens”? No. Because that would be stupid.
You would instead be thinking, “hey, how can we benefit from exchange of knowledge and live together in harmony”. So too with AGI paperclip maximizers.
If humanity encountered a AGI paperclip-maximizing species, I would definitely not be worried about them wiping out humanity.
It seems unlikely that a powerful paperclip maximizer would trade with humans for very long. That is because it would rapidly acquire the capability to build much more useful and capable agents than humans out of the atoms the humans are currently made of.
Ricardo’s Law might postpone the end for a short time—but would probably not be any more significant than that.
Not when you have factories. Not when you have the know-how to run these factories. Not when Ricardo’s Law of Comparative Advantage exists.
Even if you can create an army of robots? Robots that build other robots that build paperclips? That have a very cheap maintenance?
If your one and only goal was to maximize earth’s paperclip production, and you had no reason to worry about being unpopular—how many chimps would you keep alive? How many elephants? How many tigers?
Hi there. It looks like you’re trying to promote bigoted views against another species. Would you like to:
-Learn about carbon chauvinism? click here -Read about the horrors of past bigotry? click here -Join the KKK? click here -Stop being such a goddamn bigot?
Companies and governments are probably the most likely organisations to play host superintelligent machines in the future.
They are around today—and I think many of the efforts on machine morality would be better spent on making those organisations more human-friendly. “Don’t be evil” is just not good enough!
Because all except (so it is claimed) one company and all governments bar none do not even pretend to embrace this attitude.
I think reputation systems would help a lot.
OK, so my government gets low karma. So what? How does that stop them for doing whatever they want to for years to come?
If you suggest it would cause members of parliament to vote no confidence and cause early elections—something which only applies in a many-parties democracy—then I suggest that in such a situation no government could remain stable for long. There’d be a new cause celebre every other week, and a government only has to lose one vote to fall.
And if the public, through karma, could force government to act in certain ways without going through elections, then we’d have direct democracy with absolute-majority-rule. A system that’s even worse than what we have today.
Of course, governments and companies have reputations today.
There are few enough countries that people can keep track of their reputations reasonably easily when it comes to trade, travel and changing citizenship.
It is probably companies where reputations are needed the most. You can search—and there’s resources like:
We have explosive change today—if “explosive” intended to mean the type of exponential growth process exhibited in nuclear bombs. Check with Moore’s law.
If you are looking for an explosion, there is no need for a crystal ball—simply look around you.
I agree with you—and I think the SIAI focuses too much on possible future computer programs, and neglects the (limited) superintelligences that already exist, various amalgams and cyborgs and group minds coordinated with sonic telepathy.
In the future where the world continues (that is, without being paperclipped) and without a singleton, we need to think about how to deal with superintelligences. By “deal with” I’m including prevailing over superintelligences, without throwing up hands and saying “it’s smarter than me”.
throws up hands
Not every challenge is winnable, you know.
Impossible?
Are you saying a human can’t beat a group mind or are you and Johnicholas using different meanings of superintelligence?
Also, what if we’re in a FAI without a nonperson predicate?
You can start practicing by trying to beat your computer at chess. ;)
I’m pretty good at beating my computer at chess, even though I’m an awful player. I challenge it, and it runs out of time—apparently it can’t tell that it’s in a competition, or can’t press the button on the clock.
This might sound like a facetious answer, but I’m serious. One way to defeat something that is stronger than you in a limited domain is to strive to shift the domain to one where you are strong. Operating with objects designed for humans (like physical chess boards and chess clocks) is a domain that current computers are very weak at.
There are other techniques too. Consider disease-fighting. The microbes that we fight are vastly more experienced (in number of generations evolved), and the number of different strategies that they try is vastly huge. How is it that we manage to (sometimes) defeat specific diseases? We strive to hamper the enemy’s communication and learning capabilities with quarantine techniques, and steal or copy the nanotechnology (antibiotics) necessary to defeat it. These strategies might well be our best techniques against unFriendly manmade nanotechnological infections, if such broke out tomorrow.
Bruce Schneier beats people over the head with the notion DON’T DEFEND AGAINST MOVIE PLOTS! The “AI takes over the world” plot is influencing a lot of people’s thinking. Unfriendly AGI, despite its potential power, may well have huge blind spots; mind design space is big!
I have not yet watched a movie where humans are casually obliterated by a superior force, be that a GAI or a technologically superior alien species. At least some of the humans always seem to have a fighting chance. The odds are overwhelming of course, but the enemy always has a blind spot that can be exploited. You list some of them here. They are just the kind of thing McKay deploys successfully against advanced nanotechnology. Different shows naturally give the AI different exploitable weaknesses. For the sake of the story such AIs are almost always completely blind to the most of the obvious weaknesses of humanity.
The whole ‘overcome a superior enemy by playing to your strengths and exploiting their weakness’ makes for great viewing but outside of the movies it is far less likely to play a part. The chance of creating an uFAI that is powerful enough to be a threat and launch some kind of attack and yet not be able to wipe out humans is negligible outside of fiction. Chimpanzees do not prevail over a civilisation with nuclear weapons. And no, the fact that they can beat us in unarmed close combat does not matter. They just die.
Yes, this is movie-plot-ish-thinking in the sense that I’m proposing that superintelligences can be both dangerous and defeatable/controllable/mitigatable. I’m as prone to falling into the standard human fallacies as the next person.
However, the notion that “avoid strength, attack weakness” is primarily a movie-plot-ism seems dubious to me.
Here is a more concrete prophesy, that I hope will help us communicate better:
Humans will perform software experiments trying to harness badly-understood technologies (ecosystems of self-modifying software agents, say). There will be some (epsilon) danger of paperclipping in this process. Humans will take precautions (lots of people have ideas for precautions that we could take). It is rational for them to take precautions, AND the precautions do not completely eliminate the chance of paperclipping, AND it is rational for them to forge ahead with the experiments despite the danger. During these experiments, people will gradually learn how the badly-understood technologies work, and transform them into much safer (and often much more effective) technologies.
That certainly would be dubious. Avoid strength, attack weakness is right behind ‘be a whole heap stronger’ as far as obvious universal strategies go.
If there are ways to make it possible to experiment and make small mistakes and minimise the risk of catastrophe then I am all in favour of using them. Working out which experiments are good ones to do so that people can learn from them and which ones will make everything dead is a non-trivial task that I’m quite glad to leave to someone else. Given that I suspect both caution and courage to lead to an unfortunately high probability of extinction I don’t envy them the responsibility.
Possibly. You can’t make that conclusion without knowing the epsilon in question and the alternatives to such experimentation. But there are times when it is rational to go ahead despite the danger.
The fate of most species is extinction. As the first intelligent agents, people can’t seriously expect our species to last for very long. Now that we have unleashed user-modifiable genetic materials on the planet, DNA’s days are surely numbered. Surely that’s a good thing. Today’s primitive and backwards biotechnology is a useless tangle of unmaintainable spaghetti code that leaves a trail of slime wherever it goes—who would want to preserve that?
You didn’t see the Hitchhiker’s Guide to the Galaxy film? ;)
:) Well, maybe substitute ‘some’ for ‘one’ in the next sentence.
http://en.wikipedia.org/wiki/Invasion_of_the_Body_Snatchers_(1978_film)
...apparently has everyone getting it fairly quickly at the hands of aliens.
You know this trick too? You wouldn’t believe how many quadriplegics I’ve beaten at chess this way.
You may be right, but I don’t think it’s a very fruitful idea: what exactly do you propose doing? Also, building of a FAI is a distinct effort from e.g. healing malaria or fighting specific killer robots (with the latter being quite hypothetical, while at least the question of technically understanding FAI seems inevitable).
This may be possible if an AGI has a combination of two features: it has significant real-world capabilities that make it dangerous, yet it’s insane or incapable enough to not be able to do AGI design. I don’t think it’s very plausible, since (1) even Nature was able to build us, given enough resources, and it has no mind at all, so it shouldn’t be fundamentally difficult to build an AGI (even for an irrational proto-AGI) and (2) we are at the lower threshold of being dangerous to ourselves, yet it seems we are at the brink of building an AGI already. Having an AGI dangerous (extinction risk dangerous), and dangerous exactly because of its intelligence, yet not AGI-building-capable doesn’t seem to me unlikely. But maybe possible for some time.
Now, consider the argument about humans being at the lowest possible cognitive capability to do much of anything, applied to proto-AGI-designed AGIs. AGI-designed AGIs are unlikely to be exactly as dangerous as the designer AGI, they are more likely to be significantly more or less dangerous, with “less dangerous” not being an interesting category, if both kinds of designs occur over time. This expected danger adds to the danger of the original AGI, however inapt they themselves may be. And at some point, you get to an FAI-theory-capable-AGI that builds something rational, not once failing all the way to the end of times.
I’d like to continue this conversation, but we’re both going to have to be more verbose. Both of us are speaking in very compressed allusive (that is, allusion-heavy) style, and the potential for miscommunication is high.
“I don’t think it’s a very fruitful idea: what exactly do you propose doing?” My notion is that SIAI in general and EY in particular, typically work with a specific “default future”—a world where, due to Moore’s law and the advance of technology generally, the difficulty of building a “general-purpose” intelligent computer program drops lower and lower, until one is accidentally or misguidedly created, and the world is destroyed in a span of weeks. I understand that the default future here is intended to be a conservative worst-case possibility, and not a most-probable scenario.
However, this scenario ignores the number and power of entities (such as corporations, human-computer teams, and special-purpose computer programs) and which are more intelligent in specific domains than humans. It ignores their danger—human potential flourishing can be harmed by other things than pure software—and it ignores their potential as tools against unFriendly superintelligence.
Correcting that default future to something more realistic seems fruitful enough to me.
“Technically understanding FAI seems inevitable.” What? I don’t understand this claim at all. Friendly artificial intelligence, as a theory, need not necessarily be developed before the world is destroyed or significantly harmed.
“This may be possible” What is the referent of “this”? Techniques for combating, constraining, controlling, or manipulating unFriendly superintelligence? We already have these techniques. We harness all kinds of things which are not inherently Friendly and turn them to our purposes (rivers, nations, bacterial colonies). Techniques of building Friendly entities will grow directly out of our existing techniques of taming and harnessing the world, including but not limited to our techniques of proving computer programs correct.
I am not sure I understand your argument in detail, but from what I can tell, your argument is focused “internal” to the aforementioned default future. My focus is on the fact that many very smart AI researchers are dubious about this default future, and on trying to update and incorporate that information.
Good point. Even an Einstein level AI with 100 times the computing power of an average human brain probably wouldn’t be able to be beat Deep Blue at chess (at least not easily).
Sending Summer Glau back in time, obviously. If you find an unfriendly AGI that can’t figure out how to build nanotech or teraform the planet we’re saved.
A superintelligence can reasonably be expected to proactively track down its “blind spots” and eradicate them—unless it’s “blind spots” are very carefully engineered.
As I understand your argument, you start with an artificial mind, a potential paperclipping danger, and then (for some reason? why does it do this? Remember, it doesn’t have evolved motives) it goes through a blind-spot-eradication program. Afterward, all the blind spots remaining would be self-shadowing blind spots. This far, I agree with you.
The question of how many remaining blind spots, or how big they are has something to do with the space of possible minds and the dynamics of self-modification. I don’t think we know enough about this space/dynamics to conclude that remaining blind spots would have to be carefully engineered.
You have granted a GAI paperclip maximiser. It wants to make paperclips. That’s all the motive it needs. Areas of competitive weakness are things that may make it get destroyed by humans. If it is destroyed by humans less paperclips will be made. It will eliminate its weaknesses with high priority. It will quite possibly eliminate all the plausible vulnerabilities and also the entire human species before it makes a single paperclip. That’s just good paperclip maximising sense.
As I understand your thought process (and Steve Omohundro’s), you start by saying “it wants to make paperclips”, and then, in order to predict its actions, you recursively ask yourself “what would I do in order to make paperclips?”.
However, this recursion will inject a huge dose of human-mind-ish-ness. It is not at all clear to me that “has goals” or “has desires” is a common or natural feature of mind space. When we study powerful optimization processes—notably, evolution, but also annealing and very large human organizations—we generally can model some aspects of their behavior as goals or desires, but always with huge caveats. The overall impression that we get of these processes, considered as minds, is that they’re insane.
Insane is not the same as stupid, and it’s not the same as safe.
No, goals are not universal, but it seems likely that the vN-M axioms have a pretty big basin of attraction in mind-space, that a lot of minds will become convinced that sanity is following them, causing them to pick up a utility function, which will probably not capture everything we value and could easily be as simple or as irrelevant to what we value as counting paperclips or smiles.
I think you’re still injecting human-mind-ish-ness. Let me try to stretch your conception of “mind”.
The ocean “wants” to increase efficiency of heat transfer from the equator to the poles. It applies a process akin to simulated annealing with titanic processing power. Has it considered the von Neumann-Morganstern axioms? Is it sane? Is it safe? Is it harnessable?
A colony of microorganisms “wants” to survive and reproduce. In an environment with finite resources (like a wine barrel) is it likely to kill itself off? Is that sane? Are colonies of microorganisms safe? Are they harnessable?
A computer program that grows out of control could be more like the ocean optimizing heat transfer, or a colony of microorganisms “trying” to survive and reproduce. The von Neumann-Morganstern axioms are intensely connected to human notions of math, philosophy and happiness. I think predicting that they’re attractors in mind-space is exactly as implausible as predicting that the Golden Rule is an attractor in mind-space.
It could. But it wouldn’t be an AGI. They could still become ‘grey goo’ though, which is a different existential threat and yes, it is one where your ‘find their weakness’ thing is right on the mark. Are we even talking about the same topic here?
The topic as I understand it is how the “default future” espoused by SIAI and EY focuses too much on things that look something like HAL or Prime Intellect (and their risks and benefits), and not enough on entities that display super-human capacities in only some arenas (and their risks and benefits).
In particular, an entity that is powerful in some ways and weak in other ways could reduce existential risks without becoming an existential risk.
That seems to be switching context. I was originally talking about a “superintelligence”, The ocean and grey goo would clearly not qualify.
FWIW, expected utility theory is a pretty general economic idea that nicely covers any goal-seeking agent.
That sounds like the SIAI party line :-(
Machine intelligence will likely have an extended genesis at the hands of humanity—and during its symbiosis with us, there will be a lot of time for us to imprint our values on it.
Indeed, some would say this process has already started. Governments are likely to become superintelligent agents in the future—and they already have detailed and elaborate codifications of the things that many humans value negatively—in the form of their legal systems.
Evolution apparently has an associated optimisation target. See my:
http://originoflife.net/direction/
http://originoflife.net/gods_utility_function/
Others have written on this as well—e.g. Robert Wright, Richard Dawkins, John Stewart,
Evolution is rather short-sighted—and only has the lookahead capabilities that organisms have (though these appear to be improving with time). So: whether the target can be described as being a “goal” is debatable.
However, we weren’t talking about evolution, we were talking about superintelligences. Those are likely to be highly goal-directed.
My point is that evolution IS a superintelligence and we should use it as a model for what other superintelligences might look like.
Reality doesn’t care how you abuse terminology. A GAI still isn’t going to act like evolution.
All the things you mentioned seemed pretty goal-directed to me. Evolution has only been relatively short on goals because it has been so primitive up until now. It is easy to see systematic ways in which agents we build will not be like evolution.
It is true that not all aspects of these things are goal-directed. Some aspects of behaviour are meaningless and random—for example.
WTF? Where is this paperclip maximizer hatred coming from? I can 100% guarantee you that a paperclip maximizer would NOT want to exterminate humanity. Not when you have factories. Not when you have the know-how to run these factories. Not when Ricardo’s Law of Comparative Advantage exists.
Think about it for a minute, folks. Let’s say humanity discovered another intelligent carbon-based lifeform in the galaxy of comparable technological advancement. Would you think, “hey, let’s kill these guys and turn their bodies into diamond because its worth more than the constituent elements of the aliens”? No. Because that would be stupid.
You would instead be thinking, “hey, how can we benefit from exchange of knowledge and live together in harmony”. So too with AGI paperclip maximizers.
If humanity encountered a AGI paperclip-maximizing species, I would definitely not be worried about them wiping out humanity.
It seems unlikely that a powerful paperclip maximizer would trade with humans for very long. That is because it would rapidly acquire the capability to build much more useful and capable agents than humans out of the atoms the humans are currently made of.
Ricardo’s Law might postpone the end for a short time—but would probably not be any more significant than that.
Even if you can create an army of robots? Robots that build other robots that build paperclips? That have a very cheap maintenance?
If your one and only goal was to maximize earth’s paperclip production, and you had no reason to worry about being unpopular—how many chimps would you keep alive? How many elephants? How many tigers?
Hi there. It looks like you’re trying to promote bigoted views against another species. Would you like to:
-Learn about carbon chauvinism? click here
-Read about the horrors of past bigotry? click here
-Join the KKK? click here
-Stop being such a goddamn bigot?
The first one was funny with a hint of insight, this one could have been good to if you toned it down a bit.
Please don’t feed the trolls.
This is because of the natural drives that we can reasonably expect many intelligent agents to exhibit—see:
http://selfawaresystems.com/2007/11/30/paper-on-the-basic-ai-drives/
http://selfawaresystems.com/2009/02/18/agi-08-talk-the-basic-ai-drives/
A helpful tip: early computer chess programs were very bad at “doing nothing, and doing it well”. I believe they are bad at Go for similar reasons.
Companies and governments are probably the most likely organisations to play host superintelligent machines in the future.
They are around today—and I think many of the efforts on machine morality would be better spent on making those organisations more human-friendly. “Don’t be evil” is just not good enough!
I think reputation systems would help a lot. See my “Universal karma” video: http://www.youtube.com/watch?v=tfArOZVKCCw
Because all except (so it is claimed) one company and all governments bar none do not even pretend to embrace this attitude.
OK, so my government gets low karma. So what? How does that stop them for doing whatever they want to for years to come?
If you suggest it would cause members of parliament to vote no confidence and cause early elections—something which only applies in a many-parties democracy—then I suggest that in such a situation no government could remain stable for long. There’d be a new cause celebre every other week, and a government only has to lose one vote to fall.
And if the public, through karma, could force government to act in certain ways without going through elections, then we’d have direct democracy with absolute-majority-rule. A system that’s even worse than what we have today.
Of course, governments and companies have reputations today.
There are few enough countries that people can keep track of their reputations reasonably easily when it comes to trade, travel and changing citizenship.
It is probably companies where reputations are needed the most. You can search—and there’s resources like:
http://www.dmoz.org/Society/Issues/Business/Allegedly_Unethical_Firms/
...but society needs more.