Your third point is valid, but your first is basicallywrong; our environments occupy a small and extremely regular subset of the possibility space, so that success on a certain few tasks seems to correlate extremely well with predicted success across plausible future domains. Measuring success on these tasks is something AIs can easily do; EURISKO accomplished it in fits and starts. More generally, intelligence isn’t magical: if there’s any way we can tell whether a change in an AGI represents a bug or an improvement, then there’s an algorithm that an AI can run to do the same.
As for the second problem, one idea that may not have occurred to you is that an AI could write a future version of itself, bug-check and test out various subsystems and perhaps even the entire thing on a virtual machine first, and then shut itself down and start up the successor. If there’s a way for Lenat to see that EURISKO isn’t working properly and then fix it, then there’s a way for AI (version N) to see that AI (version N+1) isn’t working properly and fix it before making the change-over.
In those posts you are arguing something different from what I was talking about. Sure chimps will never make better technology than humans, but sometimes making more advanced clever technology is not what you want to do and be positively detrimental to your chances of shaping the world to a desirable state. The arms race for nuclear weapons for example or bio-weapons research.
If humans manage to invent a virus that wipes us out, would you still call that intelligent? If so it is not that sort of intelligence we need to create… we need to create things that win in the end, not have short term wins and then destroy itself.
“More generally, intelligence isn’t magical: if there’s any way we can tell whether a change in an AGI represents a bug or an improvement, then there’s an algorithm that an AI can run to do the same.”
Except that we don’t—can’t—do it by pure armchair thought, which is what the recursive self-improvement proposal amounts to.
The approach of testing a new version in a sandbox had occurred to me, and I agree it is a very promising one for many things—but recursive self-improvement isn’t among them! Consider, what’s the primary capability for which version N+1 is being tested? Why, the ability to create version N+2… which involves testing N+2… which involves creating N+3… etc.
Again, there’s enough correlation between ability to perform certain tasks that you don’t need an infinite recursion. To test AIv(N+1)‘s ability to program to exact specification, instead of having it program AIv(N+2) have it instead program some other things that AIvN finds difficult (but whose solutions are within AIvN’s power to verify). That we will be applying AIv(N+1)’s precision programming to itself doesn’t mean we can’t test it on non-recursive data first.
ETA: Of course, since we want the end result to be a superintelligence, AIvN might also ask AIv(N+1) for verifiable insight into an array of puzzling questions, some of which AIvN can’t figure out but suspects are tractable with increased intelligence.
If you observed something to work 15 times, how do you know that it’ll work 16th time? You obtain a model of increasing precision with each test, that lets you predict what happens next, on a test you haven’t performed yet. The same way, you can try to predict what happens on the first try, before any observations took place.
Another point is that testing can be a part of the final product: instead of building a working gizmo, you build a generic self-testing adaptive gizmo that finds the right parameters itself, and that is pre-designed to do that in the most optimal way.
Unfortunately there are three fundamental problems, any one of which would suffice to make this approach unworkable even in principle.
Keep the bolded context in mind.
What constitutes an improvement is not merely a property of the program, but also a property of the world. Even if an AI program could find improvements, it could not distinguish them from bugs.
There are large classes of problem for which this is not the case. For example, “make accurate predictions about future sensory data based on past sensory data” relates program to world, but optimizing for this task is a property of the time, memory, and accuracy trade-offs involved. At the very least, your objection fails to apply “even in principle”.
Any significant AI program is the best effort of one or more highly skilled human programmers. To qualitatively improve on that work, the program would have to be at least as smart as the humans, so even if recursive self-improvement were workable, which it isn’t, it would be irrelevant until after human level intelligence is reached by other means.
This depends on the definition of “qualitatively improve”. It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand, so I think this too fails the “even in principle” test.
Any self-modifying system relies, for the ability to constructively modify upper layers of content, on unchanging lower layers of architecture. This is a general principle that applies to all complex systems from engineering to biology to politics.
This seems the most reasonable objection of the three. Interestingly enough, Eliezer claims it’s this very difference that makes self-improving AI unique.
I think this also fails to apply universally, for somewhat more subtle reasons involving the nature of software. A self-improving AI possessing precise specifications of its components has no need for a constant lower layer of architecture (except if you mean the hardware, which is a very different question). The Curry-Howard isomorphism offers a proof-of-concept here: An AI composed of precise logical propositions can arbitrarily rewrite its proofs/programs with any other program of an equivalent type. If you can offer a good argument for why this is impossible in principle, I’d be interested in hearing it.
“It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand”
As far as I can see, the only self improvements it came up with were fairly trivial ones that Lenat could easily have done by hand. Where it came up with important improvements that Lenat wouldn’t have thought of by himself was in the Traveler game—a simple formal puzzle, fully captured by a set of rules that were coded into the program, making the result a success in machine learning but not self-improvement.
“The Curry-Howard isomorphism offers a proof-of-concept here”
Indeed this approach shows promise, and is the area I’m currently working on. For example if you can formally specify an indexing algorithm, you are free to look for optimized versions provided you can formally prove they meet the specification. If we can make practical tools based on this idea, we could make software engineering significantly more productive.
But this can only be part of the story. Consider the requirement that a program have an intuitive user interface. We have nothing remotely approaching the ability to formally specify this, nor could an AI ever come up with such by pure introspection because it depends on entities that are not part of the AI. Nor, obviously, is a formal specification of human psychology the kind of thing that would ever be approached accidentally by experimentation with Eurisko-style programs as was the original topic. And if the science of some future century—that would need to be to today’s science as the latter is to witchcraft—ever does manage to accomplish such, why then, that would be the key to enabling the development of provably Friendly AI.
Where it came up with important improvements that Lenat wouldn’t have thought of by himself was in the Traveler game—a simple formal puzzle, fully captured by a set of rules that were coded into the program, making the result a success in machine learning but not self-improvement.
...And applied these improvements to the subsequent modified set of rules. “That was machine learning, not self-improvement” sounds like a fully general counter-argument, especially considering your skepticism toward the very idea of self-improvement. Perhaps you can clarify the distinction?
Consider the requirement that a program have an intuitive user interface. We have nothing remotely approaching the ability to formally specify this, nor could an AI ever come up with such by pure introspection because it depends on entities that are not part of the AI.
An AI is allowed to learn from its environment, no one’s claiming it will simply meditate on the nature of being and then take over the universe. That said, this example has nothing to do with UFAI. A paperclip maximizer has no need for an intuitive user interface.
And if [science] ever does manage to accomplish [a formal specification of human psychology], why then, that would be the key to enabling the development of provably Friendly AI.
Indeed! Sadly, such a specification is not required to interact with and modify one’s environment. Humans were killing chimpanzees with stone tools long before they even possessed the concept of “psychology”.
I’ll call it self improvement when a substantial, nontrivial body of code is automatically developed, that is applicable to domains other than gameplaying, as opposed to playing a slightly different version of the same game. (Note that it was substantially the same strategy that won the second time as the first time, despite the rules changes.)
“A paperclip maximizer has no need for an intuitive user interface.”
True if you’re talking about something like Galactus that begins the story already possessing the ability to eat planets. However, UFAI believers often talk about paperclip maximizers being able to get past firewalls etc. by verbal persuasion of human operators. That certainly isn’t going to happen without a comprehensive theory of human psychology.
I’ve written a more detailed explanation of why recursive self-improvement is a figment of our imaginations: http://code.google.com/p/ayane/wiki/RecursiveSelfImprovement
Your third point is valid, but your first is basically wrong; our environments occupy a small and extremely regular subset of the possibility space, so that success on a certain few tasks seems to correlate extremely well with predicted success across plausible future domains. Measuring success on these tasks is something AIs can easily do; EURISKO accomplished it in fits and starts. More generally, intelligence isn’t magical: if there’s any way we can tell whether a change in an AGI represents a bug or an improvement, then there’s an algorithm that an AI can run to do the same.
As for the second problem, one idea that may not have occurred to you is that an AI could write a future version of itself, bug-check and test out various subsystems and perhaps even the entire thing on a virtual machine first, and then shut itself down and start up the successor. If there’s a way for Lenat to see that EURISKO isn’t working properly and then fix it, then there’s a way for AI (version N) to see that AI (version N+1) isn’t working properly and fix it before making the change-over.
In those posts you are arguing something different from what I was talking about. Sure chimps will never make better technology than humans, but sometimes making more advanced clever technology is not what you want to do and be positively detrimental to your chances of shaping the world to a desirable state. The arms race for nuclear weapons for example or bio-weapons research.
If humans manage to invent a virus that wipes us out, would you still call that intelligent? If so it is not that sort of intelligence we need to create… we need to create things that win in the end, not have short term wins and then destroy itself.
Super-plagues and other doomsday tools are possible with current technology. Effective countermeasures are not. Ergo, we need more intelligence, ASAP.
“More generally, intelligence isn’t magical: if there’s any way we can tell whether a change in an AGI represents a bug or an improvement, then there’s an algorithm that an AI can run to do the same.”
Except that we don’t—can’t—do it by pure armchair thought, which is what the recursive self-improvement proposal amounts to.
The approach of testing a new version in a sandbox had occurred to me, and I agree it is a very promising one for many things—but recursive self-improvement isn’t among them! Consider, what’s the primary capability for which version N+1 is being tested? Why, the ability to create version N+2… which involves testing N+2… which involves creating N+3… etc.
Again, there’s enough correlation between ability to perform certain tasks that you don’t need an infinite recursion. To test AIv(N+1)‘s ability to program to exact specification, instead of having it program AIv(N+2) have it instead program some other things that AIvN finds difficult (but whose solutions are within AIvN’s power to verify). That we will be applying AIv(N+1)’s precision programming to itself doesn’t mean we can’t test it on non-recursive data first.
ETA: Of course, since we want the end result to be a superintelligence, AIvN might also ask AIv(N+1) for verifiable insight into an array of puzzling questions, some of which AIvN can’t figure out but suspects are tractable with increased intelligence.
If you observed something to work 15 times, how do you know that it’ll work 16th time? You obtain a model of increasing precision with each test, that lets you predict what happens next, on a test you haven’t performed yet. The same way, you can try to predict what happens on the first try, before any observations took place.
Another point is that testing can be a part of the final product: instead of building a working gizmo, you build a generic self-testing adaptive gizmo that finds the right parameters itself, and that is pre-designed to do that in the most optimal way.
Where is the evidence that EURISKO ever accomplished anything? No one but the author has seen the source code.
On the subject of self-improving AI, you say:
Keep the bolded context in mind.
There are large classes of problem for which this is not the case. For example, “make accurate predictions about future sensory data based on past sensory data” relates program to world, but optimizing for this task is a property of the time, memory, and accuracy trade-offs involved. At the very least, your objection fails to apply “even in principle”.
This depends on the definition of “qualitatively improve”. It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand, so I think this too fails the “even in principle” test.
This seems the most reasonable objection of the three. Interestingly enough, Eliezer claims it’s this very difference that makes self-improving AI unique.
I think this also fails to apply universally, for somewhat more subtle reasons involving the nature of software. A self-improving AI possessing precise specifications of its components has no need for a constant lower layer of architecture (except if you mean the hardware, which is a very different question). The Curry-Howard isomorphism offers a proof-of-concept here: An AI composed of precise logical propositions can arbitrarily rewrite its proofs/programs with any other program of an equivalent type. If you can offer a good argument for why this is impossible in principle, I’d be interested in hearing it.
“It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand”
As far as I can see, the only self improvements it came up with were fairly trivial ones that Lenat could easily have done by hand. Where it came up with important improvements that Lenat wouldn’t have thought of by himself was in the Traveler game—a simple formal puzzle, fully captured by a set of rules that were coded into the program, making the result a success in machine learning but not self-improvement.
“The Curry-Howard isomorphism offers a proof-of-concept here”
Indeed this approach shows promise, and is the area I’m currently working on. For example if you can formally specify an indexing algorithm, you are free to look for optimized versions provided you can formally prove they meet the specification. If we can make practical tools based on this idea, we could make software engineering significantly more productive.
But this can only be part of the story. Consider the requirement that a program have an intuitive user interface. We have nothing remotely approaching the ability to formally specify this, nor could an AI ever come up with such by pure introspection because it depends on entities that are not part of the AI. Nor, obviously, is a formal specification of human psychology the kind of thing that would ever be approached accidentally by experimentation with Eurisko-style programs as was the original topic. And if the science of some future century—that would need to be to today’s science as the latter is to witchcraft—ever does manage to accomplish such, why then, that would be the key to enabling the development of provably Friendly AI.
...And applied these improvements to the subsequent modified set of rules. “That was machine learning, not self-improvement” sounds like a fully general counter-argument, especially considering your skepticism toward the very idea of self-improvement. Perhaps you can clarify the distinction?
An AI is allowed to learn from its environment, no one’s claiming it will simply meditate on the nature of being and then take over the universe. That said, this example has nothing to do with UFAI. A paperclip maximizer has no need for an intuitive user interface.
Indeed! Sadly, such a specification is not required to interact with and modify one’s environment. Humans were killing chimpanzees with stone tools long before they even possessed the concept of “psychology”.
“Perhaps you can clarify the distinction?”
I’ll call it self improvement when a substantial, nontrivial body of code is automatically developed, that is applicable to domains other than gameplaying, as opposed to playing a slightly different version of the same game. (Note that it was substantially the same strategy that won the second time as the first time, despite the rules changes.)
“A paperclip maximizer has no need for an intuitive user interface.”
True if you’re talking about something like Galactus that begins the story already possessing the ability to eat planets. However, UFAI believers often talk about paperclip maximizers being able to get past firewalls etc. by verbal persuasion of human operators. That certainly isn’t going to happen without a comprehensive theory of human psychology.