Unfortunately there are three fundamental problems, any one of which would suffice to make this approach unworkable even in principle.
Keep the bolded context in mind.
What constitutes an improvement is not merely a property of the program, but also a property of the world. Even if an AI program could find improvements, it could not distinguish them from bugs.
There are large classes of problem for which this is not the case. For example, “make accurate predictions about future sensory data based on past sensory data” relates program to world, but optimizing for this task is a property of the time, memory, and accuracy trade-offs involved. At the very least, your objection fails to apply “even in principle”.
Any significant AI program is the best effort of one or more highly skilled human programmers. To qualitatively improve on that work, the program would have to be at least as smart as the humans, so even if recursive self-improvement were workable, which it isn’t, it would be irrelevant until after human level intelligence is reached by other means.
This depends on the definition of “qualitatively improve”. It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand, so I think this too fails the “even in principle” test.
Any self-modifying system relies, for the ability to constructively modify upper layers of content, on unchanging lower layers of architecture. This is a general principle that applies to all complex systems from engineering to biology to politics.
This seems the most reasonable objection of the three. Interestingly enough, Eliezer claims it’s this very difference that makes self-improving AI unique.
I think this also fails to apply universally, for somewhat more subtle reasons involving the nature of software. A self-improving AI possessing precise specifications of its components has no need for a constant lower layer of architecture (except if you mean the hardware, which is a very different question). The Curry-Howard isomorphism offers a proof-of-concept here: An AI composed of precise logical propositions can arbitrarily rewrite its proofs/programs with any other program of an equivalent type. If you can offer a good argument for why this is impossible in principle, I’d be interested in hearing it.
“It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand”
As far as I can see, the only self improvements it came up with were fairly trivial ones that Lenat could easily have done by hand. Where it came up with important improvements that Lenat wouldn’t have thought of by himself was in the Traveler game—a simple formal puzzle, fully captured by a set of rules that were coded into the program, making the result a success in machine learning but not self-improvement.
“The Curry-Howard isomorphism offers a proof-of-concept here”
Indeed this approach shows promise, and is the area I’m currently working on. For example if you can formally specify an indexing algorithm, you are free to look for optimized versions provided you can formally prove they meet the specification. If we can make practical tools based on this idea, we could make software engineering significantly more productive.
But this can only be part of the story. Consider the requirement that a program have an intuitive user interface. We have nothing remotely approaching the ability to formally specify this, nor could an AI ever come up with such by pure introspection because it depends on entities that are not part of the AI. Nor, obviously, is a formal specification of human psychology the kind of thing that would ever be approached accidentally by experimentation with Eurisko-style programs as was the original topic. And if the science of some future century—that would need to be to today’s science as the latter is to witchcraft—ever does manage to accomplish such, why then, that would be the key to enabling the development of provably Friendly AI.
Where it came up with important improvements that Lenat wouldn’t have thought of by himself was in the Traveler game—a simple formal puzzle, fully captured by a set of rules that were coded into the program, making the result a success in machine learning but not self-improvement.
...And applied these improvements to the subsequent modified set of rules. “That was machine learning, not self-improvement” sounds like a fully general counter-argument, especially considering your skepticism toward the very idea of self-improvement. Perhaps you can clarify the distinction?
Consider the requirement that a program have an intuitive user interface. We have nothing remotely approaching the ability to formally specify this, nor could an AI ever come up with such by pure introspection because it depends on entities that are not part of the AI.
An AI is allowed to learn from its environment, no one’s claiming it will simply meditate on the nature of being and then take over the universe. That said, this example has nothing to do with UFAI. A paperclip maximizer has no need for an intuitive user interface.
And if [science] ever does manage to accomplish [a formal specification of human psychology], why then, that would be the key to enabling the development of provably Friendly AI.
Indeed! Sadly, such a specification is not required to interact with and modify one’s environment. Humans were killing chimpanzees with stone tools long before they even possessed the concept of “psychology”.
I’ll call it self improvement when a substantial, nontrivial body of code is automatically developed, that is applicable to domains other than gameplaying, as opposed to playing a slightly different version of the same game. (Note that it was substantially the same strategy that won the second time as the first time, despite the rules changes.)
“A paperclip maximizer has no need for an intuitive user interface.”
True if you’re talking about something like Galactus that begins the story already possessing the ability to eat planets. However, UFAI believers often talk about paperclip maximizers being able to get past firewalls etc. by verbal persuasion of human operators. That certainly isn’t going to happen without a comprehensive theory of human psychology.
On the subject of self-improving AI, you say:
Keep the bolded context in mind.
There are large classes of problem for which this is not the case. For example, “make accurate predictions about future sensory data based on past sensory data” relates program to world, but optimizing for this task is a property of the time, memory, and accuracy trade-offs involved. At the very least, your objection fails to apply “even in principle”.
This depends on the definition of “qualitatively improve”. It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand, so I think this too fails the “even in principle” test.
This seems the most reasonable objection of the three. Interestingly enough, Eliezer claims it’s this very difference that makes self-improving AI unique.
I think this also fails to apply universally, for somewhat more subtle reasons involving the nature of software. A self-improving AI possessing precise specifications of its components has no need for a constant lower layer of architecture (except if you mean the hardware, which is a very different question). The Curry-Howard isomorphism offers a proof-of-concept here: An AI composed of precise logical propositions can arbitrarily rewrite its proofs/programs with any other program of an equivalent type. If you can offer a good argument for why this is impossible in principle, I’d be interested in hearing it.
“It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand”
As far as I can see, the only self improvements it came up with were fairly trivial ones that Lenat could easily have done by hand. Where it came up with important improvements that Lenat wouldn’t have thought of by himself was in the Traveler game—a simple formal puzzle, fully captured by a set of rules that were coded into the program, making the result a success in machine learning but not self-improvement.
“The Curry-Howard isomorphism offers a proof-of-concept here”
Indeed this approach shows promise, and is the area I’m currently working on. For example if you can formally specify an indexing algorithm, you are free to look for optimized versions provided you can formally prove they meet the specification. If we can make practical tools based on this idea, we could make software engineering significantly more productive.
But this can only be part of the story. Consider the requirement that a program have an intuitive user interface. We have nothing remotely approaching the ability to formally specify this, nor could an AI ever come up with such by pure introspection because it depends on entities that are not part of the AI. Nor, obviously, is a formal specification of human psychology the kind of thing that would ever be approached accidentally by experimentation with Eurisko-style programs as was the original topic. And if the science of some future century—that would need to be to today’s science as the latter is to witchcraft—ever does manage to accomplish such, why then, that would be the key to enabling the development of provably Friendly AI.
...And applied these improvements to the subsequent modified set of rules. “That was machine learning, not self-improvement” sounds like a fully general counter-argument, especially considering your skepticism toward the very idea of self-improvement. Perhaps you can clarify the distinction?
An AI is allowed to learn from its environment, no one’s claiming it will simply meditate on the nature of being and then take over the universe. That said, this example has nothing to do with UFAI. A paperclip maximizer has no need for an intuitive user interface.
Indeed! Sadly, such a specification is not required to interact with and modify one’s environment. Humans were killing chimpanzees with stone tools long before they even possessed the concept of “psychology”.
“Perhaps you can clarify the distinction?”
I’ll call it self improvement when a substantial, nontrivial body of code is automatically developed, that is applicable to domains other than gameplaying, as opposed to playing a slightly different version of the same game. (Note that it was substantially the same strategy that won the second time as the first time, despite the rules changes.)
“A paperclip maximizer has no need for an intuitive user interface.”
True if you’re talking about something like Galactus that begins the story already possessing the ability to eat planets. However, UFAI believers often talk about paperclip maximizers being able to get past firewalls etc. by verbal persuasion of human operators. That certainly isn’t going to happen without a comprehensive theory of human psychology.