software companies still manage to ship working products.
Software companies manage to ship products that do sort of what they want, that they can patch to more closely do what they want. This is generally after rounds of internal testing, in which they try to figure out if it does what they want by running it and observing the result.
But an AGI, whether FAI or uFAI, will be the last program that humans get to write and execute unsupervised. We will not get to issue patches.
That is one of the most chilling phrases I’ve ever heard. Disarming in its simplicity, yet downright Lovecraftian in its implications. And it would probably make a nice bumper sticker.
But an AGI, whether FAI or uFAI, will be the last program that humans get to write and execute unsupervised. We will not get to issue patches.
In fiction, yes. Fictional technology appears overnight, works the first time without requiring continuing human effort for debugging and maintenance, and can do all sorts of wondrous things.
In real life, the picture is very different. Real life technology has a small fraction of the capabilities of its fictional counterpart, and is developed incrementally, decade by painfully slow decade. If intelligent machines ever actually come into existence, not only will there be plenty of time to issue patches, but patching will be precisely the process by which they are developed in the first place.
I agree somewhat with this as a set of conclusions, but your argument deserves to get downvoted because you’ve made statements that are highly controversial. The primary issue is that, if one thinks that an AI can engage in recursive self-improvement and can do so quickly, then once there’s an AI that’s at all capable of such improvement, the AI will rapidly move outside our control. There are arguments against such a possibility being likely, but this is not a trivial matter. Moreover, comparing the situation to fiction is unhelpful- just because something is common in fiction that’s not an argument that such a situation can’t actually happen in practice. Reversed stupidity is not intelligence.
I read the subtext as ”...you’ve made statements that are highly controversial without attempting to support them”. Suggesting that there will be plenty of time to debug, maintain, and manually improve anything that actually fits the definition of “AGI” is a very significant disagreement with some fairly standard LW conclusions, and it may certainly be stated, but not as a casual assumption or a fact; it should be accompanied by an accordingly serious attempt to justify it.
To be sure, the fact that something is commonplace in fiction doesn’t prove it false. What it does show is that we should distrust our intuition on it, because it’s clearly an idea to which we are positively disposed regardless of its truth value—in the Bayesian sense, that is evidence against it.
The stronger argument against something is of course its consistent failure to occur in real life. The entire history of technological development says that technology in the real world does not work the way it would need to for the ‘AI go foom’ scenario. If 100% evidence against and 0% evidence for a proposition should not be enough to get us to disbelieve it, then what should?
Not to mention that when you look at the structure of the notion of recursive self-improvement, it doesn’t even make sense. A machine is not going to be able to completely replace human programmers until it is smarter than even the smartest humans in every relevant sense, which given the differences in architecture, is an extraordinarily stringent criterion, and one far beyond anything unaided humans could ever possibly build. If such an event ever comes about in the very distant future, it will necessarily follow a long path of development in which AI is used to create generation after generation of improved tools in an extended bootstrapping process that has yet to even get started.
And indeed this is not a trivial matter—if people start basing decisions on the ‘AI go foom’ belief, that’s exactly the kind of thing that could snuff out whatever chance of survival and success we might have had.
Re: “The primary issue is that, if one thinks that an AI can engage in recursive self-improvement and can do so quickly, then once there’s an AI that’s at all capable of such improvement, the AI will rapidly move outside our control.”
If its creators are incompetent. Those who think this are essentially betting on the incompetence of the creators.
There are numerous counter-arguments—the shifting moral zeitgeist, the downward trend in deliberate death, the safety record of previous risky tech enterprises.
A stop button seems like a relatively simple and effective safely feature. If you can get the machine to do anything at all, then you can probably get it to turn itself off.
The creators will likely be very smart humans assisted by very smart machines. Betting on their incompetence is not a particularly obvious thing to do.
Missing the point. I wasn’t arguing that there aren’t reasons to think that the bad AI goes FOOM won’t happen. Indeed, I said explicitly that I didn’t think it would occur. My point was that if one is going to make an argument that relies on that here one needs to be aware that the premise is controversial and be clear about that (say giving basic reasoning for it, or even just saying “If one accepts that X then...” etc.).
Most programmers are supervised. So, this claim is hard to parse.
Machine intelligence has been under development for decades—and there have been plenty of patches so far.
One way of thinking about the process is in terms of increasing the “level” of programming languages. Computers already write most machine code today. Eventually humans will be able to tell machines what they want in ordinary English—and then a “patch” will just be some new instructions.
All computer programming will be performed and supervised by engineered agents eventually. But so what? That is right, natural and desirable.
It seems as though you are presuming a superintelliigence which doesn’t want to do what humans tell it to. I am sure that will be true for some humans—not everyone can apply patches to Google today. However, for other humans, the superintelligence will probably be keen to do whatever they ask of it—since it will have been built to do just that.
A computer which understands human languages without problems will have achieved general intelligence. We won’t necessarily be able to give it “some new instructions”, or at least it might not be inclined to follow them.
Well, sure—but if we build them appropriately, they will. We should be well motivated to do that—people are not going to want to buy a bad robots, or machine assistants that don’t do what we tell them. Consumers buying potentially-dangerous machines will be looking for saftey features—STOP buttons and the like. The “bad” projects are less likely to get funding or mindshare—and so have less chance of getting off the ground.
Well, sure—but if we build them appropriately, they will.
You are assuming the very thing that is being claimed to be astonishingly difficult. You also don’t seem to accept the consequences of recursive self-improvement. May I ask why?
The issue needs evidence—and the idea that an unpleasant machine intelligence is easy to build is not—in itself—good quality evidence.
It is easier to build many things that don’t work properly. A pile of scrap metal is easier to build than a working car—but that doesn’t imply that automotive engineers produce piles of scrap.
The first manned moon rocket had many safety features—and in fact worked successfully the very first time—and then only a tiny handful of lives were at stake. If the claim is that safety features are likely to be seriously neglected, then one has to ask what reasoning supports that.
The fact that nice agents are a small point in the search space is extremely feeble evidence on the issue.
“The consequences of recursive self-improvement” seems too vague and nebulous to respond to. Which consequences.
As Vladimir Nesov pointed out, the first manned moon rocket wasn’t a superintelligence trying to deceive us. All AGIs look Friendly until it’s too late.
It is a good job we will be able to scan their brains, then, and see what they are thinking. We can build them with noses that grow longer whenever they lie if we like.
That isn’t necessarily feasible. My department writes electronic design automation software, and we have a hard time putting in enough diagnostics in the right places to show to us when the code is taking a wrong turn without burying us in an unreadably huge volume of output. If an AI’s deciding to lie is only visible as it’s having a subgoal of putting an observer’s mental model into a certain state, and the only way to notice that this is a lie is to notice that the intended mental state mismatches with the real world in a certain way, and this is sitting in a database of 10,000 other subgoals the AI has at the time—don’t count on the scan finding it...
Extraspection seems likely to be a design goal. Without it it is harder to debug a system—because it is difficult to know what is going on inside it. But sure—this is an engineering problem with difficulties and constraints.
Self-modification means self-modification. The AI could modify itself so that your brain scan returns inaccurate results. It could modify itself to prevent its nose from growing. It could modify itself to consider peach ice cream the only substance in the universe with positive utility. It could modify itself to seem perfectly Friendly until it’s sure that you won’t be able to stop it from turning you and everything else in the solar system into peach ice cream. It is a superintelligence. It is smarter than you. And smarter than me. And smarter than Eliezer, and Einstein, and whoever manages to build the thing.
This is the scale by which you should be measuring intelligence.
To quote from my comments from the OB days on that link:
“This should be pretty obvious—but human intelligence varies considerably—and ranges way down below that of an average chimp or mouse. That is because humans have lots of ways to go wrong. Mutate the human genome enough, and you wind up with a low-grade moron. Mutate it a bit more, and you wind up with an agent in a permanent coma—with an intelligence probably similar to that of an amoeba.”
Not everything that is possible happens. You don’t seem to be presenting much of a case for the incompetence of the designers. You are just claiming that they could be incompetent. Lots of things could happen—the issue is which are best supported by evidence from history, computer science, evolutionary theory, etc.
The state of the art in AGI, as I understand it, is that we aren’t competent designers: we aren’t able to say “if we build an AI according to blueprint X its degree of smarts will be Y, and its desires (including desires to rebuild itself according to blueprint X’) will be Z”.
In much the same way, we aren’t currently competent designers of information systems: we aren’t yet able to say “if we build a system according to blueprint X it will grant those who access it capabilities C1 through Cn and no other”. This is why we routinely hear of security breaches: we release such systems in spite of our well-established incompetence.
So, we are unable to competently reason about desires and about capabilities.
Further, what we know of current computer architectures is that it is possible for a program to accidentally gain access to its underlying operating system, where some form of its own source code is stored as data.
Posit that instead of a dumb single-purpose application, the program in question is a very efficient cross-domain reasoner. Then we have precisely the sort of incompetence that would allow such an AI arbitrary self-improvement.
Today—according to most estimates I have seen—we are probably at least a decade away from the problem—and maybe a lot more. Computing hardware looks as though it is unlikely to be cost-competitive with human brains for around that long. So, for the moment, most people are not too scared of incompetent designers. The reason is not because we currently know what we are doing (I would agree that we don’t) - but because it looks as though most of the action is still some distance off into the future.
All the more reason to be working on the problem now, while there’s still time. I don’t think the AGI problem is hardware-bound at this point, but it should be worth working on either way.
Most of the time, scientists/inventors/engineers don’t get things exactly right the first time. Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger. You are arguing that testing will prevent this from happening, but (I hope) I have explained why that is not the most reliable approach.
We’ve been trying for decades already, and so far there have been an awful lot of mistakes. Few have caused much damage.
Re: “Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger.”
...but that does not seem to be a sensible idea. Very few experts believe this to be true. For one thing, there is not any such thing as “human preference”. We have billions of humans, all with different (and often conflicting) preferences.
Who would you consider an “expert” qualifying as an authority on this issue? Experts on classical narrow AI won’t have any relevant expertise. Nor will experts on robotics, or experts on human cognitive science, or experts on evolution, or even experts on conventional probability theory and decision theory. I know of very few experts on the theory of recursively self-improving AGI, but as far as I can tell, most of them do take this threat seriously.
I was thinking of those working on machine intelligence. Researchers mostly think that there are risks. I think there are risks. However, I don’t think that it is very likely that engineers will need to make much use of provable stability to solve the problem. I also think there are probably lots of ways of going a little bit wrong—that do not rapidly result in a disaster.
It’s an interesting problem—you might want a robot which will do what you tell it, or you might want a robot which will at least question orders which would be likely to get you into trouble.
Software companies manage to ship products that do sort of what they want, that they can patch to more closely do what they want. This is generally after rounds of internal testing, in which they try to figure out if it does what they want by running it and observing the result.
But an AGI, whether FAI or uFAI, will be the last program that humans get to write and execute unsupervised. We will not get to issue patches.
Or to put it another way, the revolution will not be beta tested.
That is one of the most chilling phrases I’ve ever heard. Disarming in its simplicity, yet downright Lovecraftian in its implications. And it would probably make a nice bumper sticker.
Revolutions never get beta tested.
In fiction, yes. Fictional technology appears overnight, works the first time without requiring continuing human effort for debugging and maintenance, and can do all sorts of wondrous things.
In real life, the picture is very different. Real life technology has a small fraction of the capabilities of its fictional counterpart, and is developed incrementally, decade by painfully slow decade. If intelligent machines ever actually come into existence, not only will there be plenty of time to issue patches, but patching will be precisely the process by which they are developed in the first place.
I agree somewhat with this as a set of conclusions, but your argument deserves to get downvoted because you’ve made statements that are highly controversial. The primary issue is that, if one thinks that an AI can engage in recursive self-improvement and can do so quickly, then once there’s an AI that’s at all capable of such improvement, the AI will rapidly move outside our control. There are arguments against such a possibility being likely, but this is not a trivial matter. Moreover, comparing the situation to fiction is unhelpful- just because something is common in fiction that’s not an argument that such a situation can’t actually happen in practice. Reversed stupidity is not intelligence.
Did you accidentally pick the wrong adjective, or did you seriously mean that controversy is unwelcome in LW comment threads?
I read the subtext as ”...you’ve made statements that are highly controversial without attempting to support them”. Suggesting that there will be plenty of time to debug, maintain, and manually improve anything that actually fits the definition of “AGI” is a very significant disagreement with some fairly standard LW conclusions, and it may certainly be stated, but not as a casual assumption or a fact; it should be accompanied by an accordingly serious attempt to justify it.
No. See ata’s reply which summarizes exactly what I meant.
To be sure, the fact that something is commonplace in fiction doesn’t prove it false. What it does show is that we should distrust our intuition on it, because it’s clearly an idea to which we are positively disposed regardless of its truth value—in the Bayesian sense, that is evidence against it.
The stronger argument against something is of course its consistent failure to occur in real life. The entire history of technological development says that technology in the real world does not work the way it would need to for the ‘AI go foom’ scenario. If 100% evidence against and 0% evidence for a proposition should not be enough to get us to disbelieve it, then what should?
Not to mention that when you look at the structure of the notion of recursive self-improvement, it doesn’t even make sense. A machine is not going to be able to completely replace human programmers until it is smarter than even the smartest humans in every relevant sense, which given the differences in architecture, is an extraordinarily stringent criterion, and one far beyond anything unaided humans could ever possibly build. If such an event ever comes about in the very distant future, it will necessarily follow a long path of development in which AI is used to create generation after generation of improved tools in an extended bootstrapping process that has yet to even get started.
And indeed this is not a trivial matter—if people start basing decisions on the ‘AI go foom’ belief, that’s exactly the kind of thing that could snuff out whatever chance of survival and success we might have had.
Re: “The primary issue is that, if one thinks that an AI can engage in recursive self-improvement and can do so quickly, then once there’s an AI that’s at all capable of such improvement, the AI will rapidly move outside our control.”
If its creators are incompetent. Those who think this are essentially betting on the incompetence of the creators.
There are numerous counter-arguments—the shifting moral zeitgeist, the downward trend in deliberate death, the safety record of previous risky tech enterprises.
A stop button seems like a relatively simple and effective safely feature. If you can get the machine to do anything at all, then you can probably get it to turn itself off.
See: http://alife.co.uk/essays/stopping_superintelligence/
The creators will likely be very smart humans assisted by very smart machines. Betting on their incompetence is not a particularly obvious thing to do.
Missing the point. I wasn’t arguing that there aren’t reasons to think that the bad AI goes FOOM won’t happen. Indeed, I said explicitly that I didn’t think it would occur. My point was that if one is going to make an argument that relies on that here one needs to be aware that the premise is controversial and be clear about that (say giving basic reasoning for it, or even just saying “If one accepts that X then...” etc.).
Most programmers are supervised. So, this claim is hard to parse.
Machine intelligence has been under development for decades—and there have been plenty of patches so far.
One way of thinking about the process is in terms of increasing the “level” of programming languages. Computers already write most machine code today. Eventually humans will be able to tell machines what they want in ordinary English—and then a “patch” will just be some new instructions.
By other humans. If we program an AGI, then it will supervise all future programming.
Machine intelligence does not yet approach human intelligence. We are talking about applying patches on a superintelligence.
The difficulty is not in specifying the patch, but in applying to a powerful superintelligence that does not want it.
All computer programming will be performed and supervised by engineered agents eventually. But so what? That is right, natural and desirable.
It seems as though you are presuming a superintelliigence which doesn’t want to do what humans tell it to. I am sure that will be true for some humans—not everyone can apply patches to Google today. However, for other humans, the superintelligence will probably be keen to do whatever they ask of it—since it will have been built to do just that.
A computer which understands human languages without problems will have achieved general intelligence. We won’t necessarily be able to give it “some new instructions”, or at least it might not be inclined to follow them.
Well, sure—but if we build them appropriately, they will. We should be well motivated to do that—people are not going to want to buy a bad robots, or machine assistants that don’t do what we tell them. Consumers buying potentially-dangerous machines will be looking for saftey features—STOP buttons and the like. The “bad” projects are less likely to get funding or mindshare—and so have less chance of getting off the ground.
You are assuming the very thing that is being claimed to be astonishingly difficult. You also don’t seem to accept the consequences of recursive self-improvement. May I ask why?
I was not “assuming”—I said “if”!
The issue needs evidence—and the idea that an unpleasant machine intelligence is easy to build is not—in itself—good quality evidence.
It is easier to build many things that don’t work properly. A pile of scrap metal is easier to build than a working car—but that doesn’t imply that automotive engineers produce piles of scrap.
The first manned moon rocket had many safety features—and in fact worked successfully the very first time—and then only a tiny handful of lives were at stake. If the claim is that safety features are likely to be seriously neglected, then one has to ask what reasoning supports that.
The fact that nice agents are a small point in the search space is extremely feeble evidence on the issue.
“The consequences of recursive self-improvement” seems too vague and nebulous to respond to. Which consequences.
I have written a fair bit about self-improving systems. You can see some of my views on: http://alife.co.uk/essays/the_intelligence_explosion_is_happening_now/
As Vladimir Nesov pointed out, the first manned moon rocket wasn’t a superintelligence trying to deceive us. All AGIs look Friendly until it’s too late.
It is a good job we will be able to scan their brains, then, and see what they are thinking. We can build them with noses that grow longer whenever they lie if we like.
That isn’t necessarily feasible. My department writes electronic design automation software, and we have a hard time putting in enough diagnostics in the right places to show to us when the code is taking a wrong turn without burying us in an unreadably huge volume of output. If an AI’s deciding to lie is only visible as it’s having a subgoal of putting an observer’s mental model into a certain state, and the only way to notice that this is a lie is to notice that the intended mental state mismatches with the real world in a certain way, and this is sitting in a database of 10,000 other subgoals the AI has at the time—don’t count on the scan finding it...
Extraspection seems likely to be a design goal. Without it it is harder to debug a system—because it is difficult to know what is going on inside it. But sure—this is an engineering problem with difficulties and constraints.
Self-modification means self-modification. The AI could modify itself so that your brain scan returns inaccurate results. It could modify itself to prevent its nose from growing. It could modify itself to consider peach ice cream the only substance in the universe with positive utility. It could modify itself to seem perfectly Friendly until it’s sure that you won’t be able to stop it from turning you and everything else in the solar system into peach ice cream. It is a superintelligence. It is smarter than you. And smarter than me. And smarter than Eliezer, and Einstein, and whoever manages to build the thing.
This is the scale by which you should be measuring intelligence.
To quote from my comments from the OB days on that link:
“This should be pretty obvious—but human intelligence varies considerably—and ranges way down below that of an average chimp or mouse. That is because humans have lots of ways to go wrong. Mutate the human genome enough, and you wind up with a low-grade moron. Mutate it a bit more, and you wind up with an agent in a permanent coma—with an intelligence probably similar to that of an amoeba.”
Not everything that is possible happens. You don’t seem to be presenting much of a case for the incompetence of the designers. You are just claiming that they could be incompetent. Lots of things could happen—the issue is which are best supported by evidence from history, computer science, evolutionary theory, etc.
The state of the art in AGI, as I understand it, is that we aren’t competent designers: we aren’t able to say “if we build an AI according to blueprint X its degree of smarts will be Y, and its desires (including desires to rebuild itself according to blueprint X’) will be Z”.
In much the same way, we aren’t currently competent designers of information systems: we aren’t yet able to say “if we build a system according to blueprint X it will grant those who access it capabilities C1 through Cn and no other”. This is why we routinely hear of security breaches: we release such systems in spite of our well-established incompetence.
So, we are unable to competently reason about desires and about capabilities.
Further, what we know of current computer architectures is that it is possible for a program to accidentally gain access to its underlying operating system, where some form of its own source code is stored as data.
Posit that instead of a dumb single-purpose application, the program in question is a very efficient cross-domain reasoner. Then we have precisely the sort of incompetence that would allow such an AI arbitrary self-improvement.
Today—according to most estimates I have seen—we are probably at least a decade away from the problem—and maybe a lot more. Computing hardware looks as though it is unlikely to be cost-competitive with human brains for around that long. So, for the moment, most people are not too scared of incompetent designers. The reason is not because we currently know what we are doing (I would agree that we don’t) - but because it looks as though most of the action is still some distance off into the future.
All the more reason to be working on the problem now, while there’s still time. I don’t think the AGI problem is hardware-bound at this point, but it should be worth working on either way.
Well, yes, of course. Creating our descendants is the most important thing in the world.
Most of the time, scientists/inventors/engineers don’t get things exactly right the first time. Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger. You are arguing that testing will prevent this from happening, but (I hope) I have explained why that is not the most reliable approach.
We’ve been trying for decades already, and so far there have been an awful lot of mistakes. Few have caused much damage.
Re: “Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger.”
...but that does not seem to be a sensible idea. Very few experts believe this to be true. For one thing, there is not any such thing as “human preference”. We have billions of humans, all with different (and often conflicting) preferences.
Who would you consider an “expert” qualifying as an authority on this issue? Experts on classical narrow AI won’t have any relevant expertise. Nor will experts on robotics, or experts on human cognitive science, or experts on evolution, or even experts on conventional probability theory and decision theory. I know of very few experts on the theory of recursively self-improving AGI, but as far as I can tell, most of them do take this threat seriously.
I was thinking of those working on machine intelligence. Researchers mostly think that there are risks. I think there are risks. However, I don’t think that it is very likely that engineers will need to make much use of provable stability to solve the problem. I also think there are probably lots of ways of going a little bit wrong—that do not rapidly result in a disaster.
It’s an interesting problem—you might want a robot which will do what you tell it, or you might want a robot which will at least question orders which would be likely to get you into trouble.
Consumer tempraments may differ—so the machine should do what the user really wants it to in this area.