A computer which understands human languages without problems will have achieved general intelligence. We won’t necessarily be able to give it “some new instructions”, or at least it might not be inclined to follow them.
Well, sure—but if we build them appropriately, they will. We should be well motivated to do that—people are not going to want to buy a bad robots, or machine assistants that don’t do what we tell them. Consumers buying potentially-dangerous machines will be looking for saftey features—STOP buttons and the like. The “bad” projects are less likely to get funding or mindshare—and so have less chance of getting off the ground.
Well, sure—but if we build them appropriately, they will.
You are assuming the very thing that is being claimed to be astonishingly difficult. You also don’t seem to accept the consequences of recursive self-improvement. May I ask why?
The issue needs evidence—and the idea that an unpleasant machine intelligence is easy to build is not—in itself—good quality evidence.
It is easier to build many things that don’t work properly. A pile of scrap metal is easier to build than a working car—but that doesn’t imply that automotive engineers produce piles of scrap.
The first manned moon rocket had many safety features—and in fact worked successfully the very first time—and then only a tiny handful of lives were at stake. If the claim is that safety features are likely to be seriously neglected, then one has to ask what reasoning supports that.
The fact that nice agents are a small point in the search space is extremely feeble evidence on the issue.
“The consequences of recursive self-improvement” seems too vague and nebulous to respond to. Which consequences.
As Vladimir Nesov pointed out, the first manned moon rocket wasn’t a superintelligence trying to deceive us. All AGIs look Friendly until it’s too late.
It is a good job we will be able to scan their brains, then, and see what they are thinking. We can build them with noses that grow longer whenever they lie if we like.
That isn’t necessarily feasible. My department writes electronic design automation software, and we have a hard time putting in enough diagnostics in the right places to show to us when the code is taking a wrong turn without burying us in an unreadably huge volume of output. If an AI’s deciding to lie is only visible as it’s having a subgoal of putting an observer’s mental model into a certain state, and the only way to notice that this is a lie is to notice that the intended mental state mismatches with the real world in a certain way, and this is sitting in a database of 10,000 other subgoals the AI has at the time—don’t count on the scan finding it...
Extraspection seems likely to be a design goal. Without it it is harder to debug a system—because it is difficult to know what is going on inside it. But sure—this is an engineering problem with difficulties and constraints.
Self-modification means self-modification. The AI could modify itself so that your brain scan returns inaccurate results. It could modify itself to prevent its nose from growing. It could modify itself to consider peach ice cream the only substance in the universe with positive utility. It could modify itself to seem perfectly Friendly until it’s sure that you won’t be able to stop it from turning you and everything else in the solar system into peach ice cream. It is a superintelligence. It is smarter than you. And smarter than me. And smarter than Eliezer, and Einstein, and whoever manages to build the thing.
This is the scale by which you should be measuring intelligence.
To quote from my comments from the OB days on that link:
“This should be pretty obvious—but human intelligence varies considerably—and ranges way down below that of an average chimp or mouse. That is because humans have lots of ways to go wrong. Mutate the human genome enough, and you wind up with a low-grade moron. Mutate it a bit more, and you wind up with an agent in a permanent coma—with an intelligence probably similar to that of an amoeba.”
Not everything that is possible happens. You don’t seem to be presenting much of a case for the incompetence of the designers. You are just claiming that they could be incompetent. Lots of things could happen—the issue is which are best supported by evidence from history, computer science, evolutionary theory, etc.
The state of the art in AGI, as I understand it, is that we aren’t competent designers: we aren’t able to say “if we build an AI according to blueprint X its degree of smarts will be Y, and its desires (including desires to rebuild itself according to blueprint X’) will be Z”.
In much the same way, we aren’t currently competent designers of information systems: we aren’t yet able to say “if we build a system according to blueprint X it will grant those who access it capabilities C1 through Cn and no other”. This is why we routinely hear of security breaches: we release such systems in spite of our well-established incompetence.
So, we are unable to competently reason about desires and about capabilities.
Further, what we know of current computer architectures is that it is possible for a program to accidentally gain access to its underlying operating system, where some form of its own source code is stored as data.
Posit that instead of a dumb single-purpose application, the program in question is a very efficient cross-domain reasoner. Then we have precisely the sort of incompetence that would allow such an AI arbitrary self-improvement.
Today—according to most estimates I have seen—we are probably at least a decade away from the problem—and maybe a lot more. Computing hardware looks as though it is unlikely to be cost-competitive with human brains for around that long. So, for the moment, most people are not too scared of incompetent designers. The reason is not because we currently know what we are doing (I would agree that we don’t) - but because it looks as though most of the action is still some distance off into the future.
All the more reason to be working on the problem now, while there’s still time. I don’t think the AGI problem is hardware-bound at this point, but it should be worth working on either way.
Most of the time, scientists/inventors/engineers don’t get things exactly right the first time. Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger. You are arguing that testing will prevent this from happening, but (I hope) I have explained why that is not the most reliable approach.
We’ve been trying for decades already, and so far there have been an awful lot of mistakes. Few have caused much damage.
Re: “Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger.”
...but that does not seem to be a sensible idea. Very few experts believe this to be true. For one thing, there is not any such thing as “human preference”. We have billions of humans, all with different (and often conflicting) preferences.
Who would you consider an “expert” qualifying as an authority on this issue? Experts on classical narrow AI won’t have any relevant expertise. Nor will experts on robotics, or experts on human cognitive science, or experts on evolution, or even experts on conventional probability theory and decision theory. I know of very few experts on the theory of recursively self-improving AGI, but as far as I can tell, most of them do take this threat seriously.
I was thinking of those working on machine intelligence. Researchers mostly think that there are risks. I think there are risks. However, I don’t think that it is very likely that engineers will need to make much use of provable stability to solve the problem. I also think there are probably lots of ways of going a little bit wrong—that do not rapidly result in a disaster.
It’s an interesting problem—you might want a robot which will do what you tell it, or you might want a robot which will at least question orders which would be likely to get you into trouble.
A computer which understands human languages without problems will have achieved general intelligence. We won’t necessarily be able to give it “some new instructions”, or at least it might not be inclined to follow them.
Well, sure—but if we build them appropriately, they will. We should be well motivated to do that—people are not going to want to buy a bad robots, or machine assistants that don’t do what we tell them. Consumers buying potentially-dangerous machines will be looking for saftey features—STOP buttons and the like. The “bad” projects are less likely to get funding or mindshare—and so have less chance of getting off the ground.
You are assuming the very thing that is being claimed to be astonishingly difficult. You also don’t seem to accept the consequences of recursive self-improvement. May I ask why?
I was not “assuming”—I said “if”!
The issue needs evidence—and the idea that an unpleasant machine intelligence is easy to build is not—in itself—good quality evidence.
It is easier to build many things that don’t work properly. A pile of scrap metal is easier to build than a working car—but that doesn’t imply that automotive engineers produce piles of scrap.
The first manned moon rocket had many safety features—and in fact worked successfully the very first time—and then only a tiny handful of lives were at stake. If the claim is that safety features are likely to be seriously neglected, then one has to ask what reasoning supports that.
The fact that nice agents are a small point in the search space is extremely feeble evidence on the issue.
“The consequences of recursive self-improvement” seems too vague and nebulous to respond to. Which consequences.
I have written a fair bit about self-improving systems. You can see some of my views on: http://alife.co.uk/essays/the_intelligence_explosion_is_happening_now/
As Vladimir Nesov pointed out, the first manned moon rocket wasn’t a superintelligence trying to deceive us. All AGIs look Friendly until it’s too late.
It is a good job we will be able to scan their brains, then, and see what they are thinking. We can build them with noses that grow longer whenever they lie if we like.
That isn’t necessarily feasible. My department writes electronic design automation software, and we have a hard time putting in enough diagnostics in the right places to show to us when the code is taking a wrong turn without burying us in an unreadably huge volume of output. If an AI’s deciding to lie is only visible as it’s having a subgoal of putting an observer’s mental model into a certain state, and the only way to notice that this is a lie is to notice that the intended mental state mismatches with the real world in a certain way, and this is sitting in a database of 10,000 other subgoals the AI has at the time—don’t count on the scan finding it...
Extraspection seems likely to be a design goal. Without it it is harder to debug a system—because it is difficult to know what is going on inside it. But sure—this is an engineering problem with difficulties and constraints.
Self-modification means self-modification. The AI could modify itself so that your brain scan returns inaccurate results. It could modify itself to prevent its nose from growing. It could modify itself to consider peach ice cream the only substance in the universe with positive utility. It could modify itself to seem perfectly Friendly until it’s sure that you won’t be able to stop it from turning you and everything else in the solar system into peach ice cream. It is a superintelligence. It is smarter than you. And smarter than me. And smarter than Eliezer, and Einstein, and whoever manages to build the thing.
This is the scale by which you should be measuring intelligence.
To quote from my comments from the OB days on that link:
“This should be pretty obvious—but human intelligence varies considerably—and ranges way down below that of an average chimp or mouse. That is because humans have lots of ways to go wrong. Mutate the human genome enough, and you wind up with a low-grade moron. Mutate it a bit more, and you wind up with an agent in a permanent coma—with an intelligence probably similar to that of an amoeba.”
Not everything that is possible happens. You don’t seem to be presenting much of a case for the incompetence of the designers. You are just claiming that they could be incompetent. Lots of things could happen—the issue is which are best supported by evidence from history, computer science, evolutionary theory, etc.
The state of the art in AGI, as I understand it, is that we aren’t competent designers: we aren’t able to say “if we build an AI according to blueprint X its degree of smarts will be Y, and its desires (including desires to rebuild itself according to blueprint X’) will be Z”.
In much the same way, we aren’t currently competent designers of information systems: we aren’t yet able to say “if we build a system according to blueprint X it will grant those who access it capabilities C1 through Cn and no other”. This is why we routinely hear of security breaches: we release such systems in spite of our well-established incompetence.
So, we are unable to competently reason about desires and about capabilities.
Further, what we know of current computer architectures is that it is possible for a program to accidentally gain access to its underlying operating system, where some form of its own source code is stored as data.
Posit that instead of a dumb single-purpose application, the program in question is a very efficient cross-domain reasoner. Then we have precisely the sort of incompetence that would allow such an AI arbitrary self-improvement.
Today—according to most estimates I have seen—we are probably at least a decade away from the problem—and maybe a lot more. Computing hardware looks as though it is unlikely to be cost-competitive with human brains for around that long. So, for the moment, most people are not too scared of incompetent designers. The reason is not because we currently know what we are doing (I would agree that we don’t) - but because it looks as though most of the action is still some distance off into the future.
All the more reason to be working on the problem now, while there’s still time. I don’t think the AGI problem is hardware-bound at this point, but it should be worth working on either way.
Well, yes, of course. Creating our descendants is the most important thing in the world.
Most of the time, scientists/inventors/engineers don’t get things exactly right the first time. Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger. You are arguing that testing will prevent this from happening, but (I hope) I have explained why that is not the most reliable approach.
We’ve been trying for decades already, and so far there have been an awful lot of mistakes. Few have caused much damage.
Re: “Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger.”
...but that does not seem to be a sensible idea. Very few experts believe this to be true. For one thing, there is not any such thing as “human preference”. We have billions of humans, all with different (and often conflicting) preferences.
Who would you consider an “expert” qualifying as an authority on this issue? Experts on classical narrow AI won’t have any relevant expertise. Nor will experts on robotics, or experts on human cognitive science, or experts on evolution, or even experts on conventional probability theory and decision theory. I know of very few experts on the theory of recursively self-improving AGI, but as far as I can tell, most of them do take this threat seriously.
I was thinking of those working on machine intelligence. Researchers mostly think that there are risks. I think there are risks. However, I don’t think that it is very likely that engineers will need to make much use of provable stability to solve the problem. I also think there are probably lots of ways of going a little bit wrong—that do not rapidly result in a disaster.
It’s an interesting problem—you might want a robot which will do what you tell it, or you might want a robot which will at least question orders which would be likely to get you into trouble.
Consumer tempraments may differ—so the machine should do what the user really wants it to in this area.