I notice you phrased this in terms of belief. I’m curious, what would you consider to be the minimum estimate of UFAI’s probability necessary to “reasonably” motivate concern or action?
If I’m right, the effect of widespread propagation of such memes will be to snuff out what chance of survival and success humanity might have had. Unlike UFAI which is pure science fiction, the strangling of progress is something that occurs—has occurred before—in real life.
What would you consider to be the minimum estimate of the probability that I’m right, necessary to “reasonably” motivate concern or action?
I’m not quite sure what “you being right” means here. Your thesis is that propagating the UFAI meme will suppress scientific and technological progress such as to contribute non-negligibly to our destruction?
I’m afraid I don’t have much background on how that’s supposed to work. If you can explain what you mean or point me to an existing explanation, I’ll try and give you an answer, rather than reactively throwing your question back at you.
Basically yes. Civilizations, species and worlds are mortal; there are rare long-lived species whose environment has remained unchanged for long periods of time, but the environment in which we evolved is long gone and our current one is not merely not stable, it is not even in equilibrium. And as long as we remain confined to one little planet running off a dwindling resource base and with everyone in weapon range of everyone else, there is nothing good about our long-term prospects. (For a fictional but eloquent discussion of some of the issues involved, see Permanence by Karl Schroeder.)
To change that, we need more advanced technology, for which we need software tools smart enough to help us deal with complexity. If our best minds start buying into the UFAI meme and turning away from building anything more ambitious than a social networking mashup, we may simply waste whatever chance we had. That is why UFAI belief is not as its proponents would have it the road of safety, but the road of oblivion.
rhollerith raised some reasonable objections to this response that I’d like to see answered, but I’ll try and answer your question without that information:
What would you consider to be the minimum estimate of the probability that I’m right, necessary to “reasonably” motivate concern or action?
As as far as concern goes, I think my threshold for concern over your proposition is identical to my threshold for concern over UFAI, as they postulate similar results (UFAI still seems marginally worse due to the chance of destroying intelligent alien life, but I’ll write this off as entirely negligible for the current discussion). I’d say 1:10,000 is a reasonable threshold for concern of the vocalized form, “hey, is anyone looking into this?” I’d love to see some more concrete discussion on this.
“Action” in your scenario is complicated by its direct opposition to acceptance of UFAI, so I can only give you some rough constraints. To simplify, I’ll assume all risks allow equally effective action to compensate for them, even though this is clearly not the case.
Let R = the scenario you’ve described, E = the scenario in which UFAI is a credible threat. “R and E” could be described as “damned if we do, damned if we don’t”, in which case action is basically futile, so I’ll consider the case where R and E are disjoint. In that case, action would only be justifiable if p(R) > p(E). My intuition says that such justification is proportional to p(R) - p(E), but I’d prefer more clarity in this step.
So that’s a rough answer… if T is my threshold probability for action in the face of existential risk, T (p(R) - p(E)) is my threshold for action on your scenario. If R and E aren’t disjoint, it looks something like T (p(R and ~E) - p(E and ~R)).
Though I’m not convinced “R and E” necessarily means “damned either way”. If I believed E in addition to R, I think what I would do is:
Forget about memetics in either direction as likely to do more harm than good, and concentrate all available resources on developing Friendly AI as reliably and quickly as possible.
However, provably Friendly AI is still not possible with 2009 vintage tools.
So I’d do it in stages, a series of self improving AIs, the early ones with low intelligence and crude Friendliness architecture, using them to develop better Friendliness architecture in tandem with increasing intelligence for the later ones. No guarantees, but if recursive self-improvement actually worked, I think that approach would have a reasonable chance of success.
rwallace has been arguing the position that AI researchers are too concerned (or will become too concerned) about the existential risk from UFAI. He writes that
we need software tools smart enough to help us deal with complexity.
rwallace: can we deal with complexity sufficiently well without new software that engages in strongly-recursive self-improvement?
Without new AGI software?
One part of the risk that rwallace says outweighs the risk of UFAI is that
we remain confined to one little planet . . . with everyone in weapon range of everyone else
The only response rwallace suggests to that risk is
we need more advanced technology, for which we need software tools smart enough to help us deal with complexity
rwallace: please give your reasoning for how more advanced technology decreases the existential risk posed by weapons more than it increases it.
Another part of the risk that rwallace says outweighs the risk of UFAI is that
we remain confined to one little planet running off a dwindling resource base
Please explain how dwindling resources presents a significant existential risk. I can come up with several argument, but I’d like to see the one or two you consider the strongest arguments.
If we have uploads we can get off the planet and stay in space for a fraction of the resources it currently costs to do manned space flight. We can spread ourselves between the stars.
But an upload might go foom, so we should stop all upload research.
It is this kind of conundrum I see humanity in at the moment.
First, an upload isn’t going to “go foom”: a digital substrate doesn’t magically confer superpowers, and early uploads will likely be less powerful than their biological counterparts in several ways.
Second, stopping upload research is not the path of safety, because ultimately we must advance or die.
First, an upload isn’t going to “go foom”: a digital substrate doesn’t magically confer superpowers, and early uploads will likely be less powerful than their biological counterparts in several ways.
Foom is about rate of power increase, not initial power level. Copy/paste isn’t everything, but still a pretty good superpower.
Second, stopping upload research is not the path of safety, because ultimately we must advance or die.
It’s not at all obvious to me that the increased risk of stagnation death outweighs the reduced risk of foom death.
Your third point is valid, but your first is basicallywrong; our environments occupy a small and extremely regular subset of the possibility space, so that success on a certain few tasks seems to correlate extremely well with predicted success across plausible future domains. Measuring success on these tasks is something AIs can easily do; EURISKO accomplished it in fits and starts. More generally, intelligence isn’t magical: if there’s any way we can tell whether a change in an AGI represents a bug or an improvement, then there’s an algorithm that an AI can run to do the same.
As for the second problem, one idea that may not have occurred to you is that an AI could write a future version of itself, bug-check and test out various subsystems and perhaps even the entire thing on a virtual machine first, and then shut itself down and start up the successor. If there’s a way for Lenat to see that EURISKO isn’t working properly and then fix it, then there’s a way for AI (version N) to see that AI (version N+1) isn’t working properly and fix it before making the change-over.
In those posts you are arguing something different from what I was talking about. Sure chimps will never make better technology than humans, but sometimes making more advanced clever technology is not what you want to do and be positively detrimental to your chances of shaping the world to a desirable state. The arms race for nuclear weapons for example or bio-weapons research.
If humans manage to invent a virus that wipes us out, would you still call that intelligent? If so it is not that sort of intelligence we need to create… we need to create things that win in the end, not have short term wins and then destroy itself.
“More generally, intelligence isn’t magical: if there’s any way we can tell whether a change in an AGI represents a bug or an improvement, then there’s an algorithm that an AI can run to do the same.”
Except that we don’t—can’t—do it by pure armchair thought, which is what the recursive self-improvement proposal amounts to.
The approach of testing a new version in a sandbox had occurred to me, and I agree it is a very promising one for many things—but recursive self-improvement isn’t among them! Consider, what’s the primary capability for which version N+1 is being tested? Why, the ability to create version N+2… which involves testing N+2… which involves creating N+3… etc.
Again, there’s enough correlation between ability to perform certain tasks that you don’t need an infinite recursion. To test AIv(N+1)‘s ability to program to exact specification, instead of having it program AIv(N+2) have it instead program some other things that AIvN finds difficult (but whose solutions are within AIvN’s power to verify). That we will be applying AIv(N+1)’s precision programming to itself doesn’t mean we can’t test it on non-recursive data first.
ETA: Of course, since we want the end result to be a superintelligence, AIvN might also ask AIv(N+1) for verifiable insight into an array of puzzling questions, some of which AIvN can’t figure out but suspects are tractable with increased intelligence.
If you observed something to work 15 times, how do you know that it’ll work 16th time? You obtain a model of increasing precision with each test, that lets you predict what happens next, on a test you haven’t performed yet. The same way, you can try to predict what happens on the first try, before any observations took place.
Another point is that testing can be a part of the final product: instead of building a working gizmo, you build a generic self-testing adaptive gizmo that finds the right parameters itself, and that is pre-designed to do that in the most optimal way.
Unfortunately there are three fundamental problems, any one of which would suffice to make this approach unworkable even in principle.
Keep the bolded context in mind.
What constitutes an improvement is not merely a property of the program, but also a property of the world. Even if an AI program could find improvements, it could not distinguish them from bugs.
There are large classes of problem for which this is not the case. For example, “make accurate predictions about future sensory data based on past sensory data” relates program to world, but optimizing for this task is a property of the time, memory, and accuracy trade-offs involved. At the very least, your objection fails to apply “even in principle”.
Any significant AI program is the best effort of one or more highly skilled human programmers. To qualitatively improve on that work, the program would have to be at least as smart as the humans, so even if recursive self-improvement were workable, which it isn’t, it would be irrelevant until after human level intelligence is reached by other means.
This depends on the definition of “qualitatively improve”. It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand, so I think this too fails the “even in principle” test.
Any self-modifying system relies, for the ability to constructively modify upper layers of content, on unchanging lower layers of architecture. This is a general principle that applies to all complex systems from engineering to biology to politics.
This seems the most reasonable objection of the three. Interestingly enough, Eliezer claims it’s this very difference that makes self-improving AI unique.
I think this also fails to apply universally, for somewhat more subtle reasons involving the nature of software. A self-improving AI possessing precise specifications of its components has no need for a constant lower layer of architecture (except if you mean the hardware, which is a very different question). The Curry-Howard isomorphism offers a proof-of-concept here: An AI composed of precise logical propositions can arbitrarily rewrite its proofs/programs with any other program of an equivalent type. If you can offer a good argument for why this is impossible in principle, I’d be interested in hearing it.
“It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand”
As far as I can see, the only self improvements it came up with were fairly trivial ones that Lenat could easily have done by hand. Where it came up with important improvements that Lenat wouldn’t have thought of by himself was in the Traveler game—a simple formal puzzle, fully captured by a set of rules that were coded into the program, making the result a success in machine learning but not self-improvement.
“The Curry-Howard isomorphism offers a proof-of-concept here”
Indeed this approach shows promise, and is the area I’m currently working on. For example if you can formally specify an indexing algorithm, you are free to look for optimized versions provided you can formally prove they meet the specification. If we can make practical tools based on this idea, we could make software engineering significantly more productive.
But this can only be part of the story. Consider the requirement that a program have an intuitive user interface. We have nothing remotely approaching the ability to formally specify this, nor could an AI ever come up with such by pure introspection because it depends on entities that are not part of the AI. Nor, obviously, is a formal specification of human psychology the kind of thing that would ever be approached accidentally by experimentation with Eurisko-style programs as was the original topic. And if the science of some future century—that would need to be to today’s science as the latter is to witchcraft—ever does manage to accomplish such, why then, that would be the key to enabling the development of provably Friendly AI.
Where it came up with important improvements that Lenat wouldn’t have thought of by himself was in the Traveler game—a simple formal puzzle, fully captured by a set of rules that were coded into the program, making the result a success in machine learning but not self-improvement.
...And applied these improvements to the subsequent modified set of rules. “That was machine learning, not self-improvement” sounds like a fully general counter-argument, especially considering your skepticism toward the very idea of self-improvement. Perhaps you can clarify the distinction?
Consider the requirement that a program have an intuitive user interface. We have nothing remotely approaching the ability to formally specify this, nor could an AI ever come up with such by pure introspection because it depends on entities that are not part of the AI.
An AI is allowed to learn from its environment, no one’s claiming it will simply meditate on the nature of being and then take over the universe. That said, this example has nothing to do with UFAI. A paperclip maximizer has no need for an intuitive user interface.
And if [science] ever does manage to accomplish [a formal specification of human psychology], why then, that would be the key to enabling the development of provably Friendly AI.
Indeed! Sadly, such a specification is not required to interact with and modify one’s environment. Humans were killing chimpanzees with stone tools long before they even possessed the concept of “psychology”.
I’ll call it self improvement when a substantial, nontrivial body of code is automatically developed, that is applicable to domains other than gameplaying, as opposed to playing a slightly different version of the same game. (Note that it was substantially the same strategy that won the second time as the first time, despite the rules changes.)
“A paperclip maximizer has no need for an intuitive user interface.”
True if you’re talking about something like Galactus that begins the story already possessing the ability to eat planets. However, UFAI believers often talk about paperclip maximizers being able to get past firewalls etc. by verbal persuasion of human operators. That certainly isn’t going to happen without a comprehensive theory of human psychology.
“Strongly-recursive self-improvement” is a figment of the imagination; among the logical errors involved is confusion between properties of a program and properties of the world.
As for the rest: do you believe humanity can survive permanently as we are now, confined to this planet? If you do, then I will point you to the geological evidence to the contrary. If not, then it follows that without more advanced technology, we are dead. Neither I nor anybody else can know what will be the proximate cause of death for the last individual, or in what century, but certain extinction is certain extinction nonetheless.
Let us briefly review the discussion up to now since many readers use the the comments page which does not provide much context. rwallace has been arguing that AI researchers are too concerned (or will become too concerned) about the existential risk from reimplementing EURISKO and things like that.
You have mentioned two or three times, rwallace, that without more advanced technology, humans will eventually go extinct. (I quote one of those 2 or 3 mentions below.) You mention that to create and to manage that future advanced technology, civilization will need better tools to manage complexity. Well, I see one possible objection to your argument right there, in that better science and better technology might well decrease the complexity of the cultural information humans are required to keep on top of. Consider that once Newton gave our civilization a correct theory of dynamics, almost all of the books written before Newton on dynamics could safely be thrown away (the exceptions being books by Descartes and Galileo that help people understand Newton and put Newton in historical context) which of course constitutes a net reduction in the complexity of the cultural information that our civilization has to keep on top of. (If it does not seem like a reduction, that is because the possession of Newtonian dynamical theory made our civilization more ambitious about what goals to try for.)
do you believe humanity can survive permanently as we are now, confined to this planet? If you do, then I will point you to the geological evidence to the contrary. If not, then it follows that without more advanced technology, we are dead.
But please explain to me what your argument has to do with EURISKO and things like that: is it your position that the complexity of future human culture can be managed only with better AGI software?
And do you maintain that that software cannot be developed fast enough by AGI researchers such as Eliezer who are being very careful about existential risks?
In general, the things you argue are dangerous are slow dangers. You yourself refer to “geological evidence” which suggests that they are dangerous on geological timescales.
In contrast, research into certain areas of AI seems to me genuinely fast dangers: things with a high probability of wiping out our civilization in the next 30, 50 or 100 years. It seems unwise to increase fast dangers to decrease slow dangers. But I suppose you disagree that AGI research if not done very carefully is a fast danger. (I’m still studying your arguments on that.)
Reduction in complexity is at least conceivable, I’ll grant. For example if someone invented a zero point energy generator with the cost and form factor of an AA battery, much of the complexity associated with the energy industry could disappear.
But this seems highly unlikely. All the current evidence suggests the contrary: the breakthroughs that will be necessary and possible in the coming century will be precisely those of complex systems (in both senses of the term).
Human level AGI in the near future is indeed neither necessary nor possible. But there is a vast gap between that and what we have today, and we will, yes, need to fill some of that gap. Perhaps a key breakthrough would have come from a young researcher who would have re-implemented Eurisko and from the experiment acquired a critical jump in understanding—and who has now quietly left, thinking Eurisko might blow up the world, to reconsider that job offer from Electronic Arts.
I do disagree that AGI research is a fast danger. I will grant you that there is a sense in which the dangers I am worried about are slow ones - barring unlikely events like a large asteroid impact (which is likely only over longer time scales), I’m confident humanity will still exist 100 years from now.
But our window of opportunity may not. Consider that civilizations are mortal, for reasons unrelated to this conversation. Consider that environments conducive to scientific progress are even considerably rarer and more transient than civilization itself. Consider also that the environment in which our civilization arose is gone, and is not coming back. (For the simplest example, while fossil fuels still exist, the easily accessible deposits thereof, so important for bootstrapping, are largely gone.) I think it quite possible that the 21st-century may be the last hard step in the Great Filter, that by the year 2100 the ultimate fate of humanity may in fact have been decided, even if nobody on that date yet knows it. I cannot of course be certain of this, but I think it likely enough that we cannot afford to risk wasting this window of opportunity.
One problem with this argument is how conjunctive it is: “(A) Progress crucially depends on breakthroughs in complexity management and (B) strong recursive self-improvement is impossible and (C) near-future human level AGI is neither dangerous nor possible but (D) someone working on it is crucial for said complexity management breakthroughs and (E) they’re dissuaded by friendliness concerns and (F) our scientific window of opportunity is small.”
My back-of-the-envelope, generous probabilities:
A. 0.5, this is a pretty strong requirement.
B. 0.9, for simplicity, giving your speculation the benefit of the doubt.
C. 0.9, same.
D. 0.1, a genuine problem of this magnitude is going to attract a lot of diverse talent.
E. 0.01, this is the most demanding element of the scenario, that the UFAI meme itself will crucially disrupt progress.
F. 0.05, this would represent a large break from our current form of steady scientific progress, and I haven’t yet seen much evidence that it’s terribly likely.
That product comes out to roughly 1:50,000. I’m guessing you think the actual figure is higher, and expect you’ll contest those specific numbers, but would you agree that I’ve fairly characterized the structure of your objection to FAI?
I notice you phrased this in terms of belief. I’m curious, what would you consider to be the minimum estimate of UFAI’s probability necessary to “reasonably” motivate concern or action?
If I’m right, the effect of widespread propagation of such memes will be to snuff out what chance of survival and success humanity might have had. Unlike UFAI which is pure science fiction, the strangling of progress is something that occurs—has occurred before—in real life.
What would you consider to be the minimum estimate of the probability that I’m right, necessary to “reasonably” motivate concern or action?
I’m not quite sure what “you being right” means here. Your thesis is that propagating the UFAI meme will suppress scientific and technological progress such as to contribute non-negligibly to our destruction?
I’m afraid I don’t have much background on how that’s supposed to work. If you can explain what you mean or point me to an existing explanation, I’ll try and give you an answer, rather than reactively throwing your question back at you.
Basically yes. Civilizations, species and worlds are mortal; there are rare long-lived species whose environment has remained unchanged for long periods of time, but the environment in which we evolved is long gone and our current one is not merely not stable, it is not even in equilibrium. And as long as we remain confined to one little planet running off a dwindling resource base and with everyone in weapon range of everyone else, there is nothing good about our long-term prospects. (For a fictional but eloquent discussion of some of the issues involved, see Permanence by Karl Schroeder.)
To change that, we need more advanced technology, for which we need software tools smart enough to help us deal with complexity. If our best minds start buying into the UFAI meme and turning away from building anything more ambitious than a social networking mashup, we may simply waste whatever chance we had. That is why UFAI belief is not as its proponents would have it the road of safety, but the road of oblivion.
rhollerith raised some reasonable objections to this response that I’d like to see answered, but I’ll try and answer your question without that information:
As as far as concern goes, I think my threshold for concern over your proposition is identical to my threshold for concern over UFAI, as they postulate similar results (UFAI still seems marginally worse due to the chance of destroying intelligent alien life, but I’ll write this off as entirely negligible for the current discussion). I’d say 1:10,000 is a reasonable threshold for concern of the vocalized form, “hey, is anyone looking into this?” I’d love to see some more concrete discussion on this.
“Action” in your scenario is complicated by its direct opposition to acceptance of UFAI, so I can only give you some rough constraints. To simplify, I’ll assume all risks allow equally effective action to compensate for them, even though this is clearly not the case.
Let R = the scenario you’ve described, E = the scenario in which UFAI is a credible threat. “R and E” could be described as “damned if we do, damned if we don’t”, in which case action is basically futile, so I’ll consider the case where R and E are disjoint. In that case, action would only be justifiable if p(R) > p(E). My intuition says that such justification is proportional to p(R) - p(E), but I’d prefer more clarity in this step.
So that’s a rough answer… if T is my threshold probability for action in the face of existential risk, T (p(R) - p(E)) is my threshold for action on your scenario. If R and E aren’t disjoint, it looks something like T (p(R and ~E) - p(E and ~R)).
A fair answer, thanks.
Though I’m not convinced “R and E” necessarily means “damned either way”. If I believed E in addition to R, I think what I would do is:
Forget about memetics in either direction as likely to do more harm than good, and concentrate all available resources on developing Friendly AI as reliably and quickly as possible.
However, provably Friendly AI is still not possible with 2009 vintage tools.
So I’d do it in stages, a series of self improving AIs, the early ones with low intelligence and crude Friendliness architecture, using them to develop better Friendliness architecture in tandem with increasing intelligence for the later ones. No guarantees, but if recursive self-improvement actually worked, I think that approach would have a reasonable chance of success.
rwallace has been arguing the position that AI researchers are too concerned (or will become too concerned) about the existential risk from UFAI. He writes that
rwallace: can we deal with complexity sufficiently well without new software that engages in strongly-recursive self-improvement?
Without new AGI software?
One part of the risk that rwallace says outweighs the risk of UFAI is that
The only response rwallace suggests to that risk is
rwallace: please give your reasoning for how more advanced technology decreases the existential risk posed by weapons more than it increases it.
Another part of the risk that rwallace says outweighs the risk of UFAI is that
Please explain how dwindling resources presents a significant existential risk. I can come up with several argument, but I’d like to see the one or two you consider the strongest arguments.
If we have uploads we can get off the planet and stay in space for a fraction of the resources it currently costs to do manned space flight. We can spread ourselves between the stars.
But an upload might go foom, so we should stop all upload research.
It is this kind of conundrum I see humanity in at the moment.
I agree, and will add:
First, an upload isn’t going to “go foom”: a digital substrate doesn’t magically confer superpowers, and early uploads will likely be less powerful than their biological counterparts in several ways.
Second, stopping upload research is not the path of safety, because ultimately we must advance or die.
Foom is about rate of power increase, not initial power level. Copy/paste isn’t everything, but still a pretty good superpower.
It’s not at all obvious to me that the increased risk of stagnation death outweighs the reduced risk of foom death.
You can’t copy paste hardware; and no, an upload won’t be able to run on a botnet.
Not to mention the bizarre assumption that an uploading patient will turn into a comic book villain whose sole purpose is to conquer the world.
Source?
Upvoted for this.
You know, I don’t think I’ve ever seen someone argue that. Does anyone have any links?
I’ve written a more detailed explanation of why recursive self-improvement is a figment of our imaginations: http://code.google.com/p/ayane/wiki/RecursiveSelfImprovement
Your third point is valid, but your first is basically wrong; our environments occupy a small and extremely regular subset of the possibility space, so that success on a certain few tasks seems to correlate extremely well with predicted success across plausible future domains. Measuring success on these tasks is something AIs can easily do; EURISKO accomplished it in fits and starts. More generally, intelligence isn’t magical: if there’s any way we can tell whether a change in an AGI represents a bug or an improvement, then there’s an algorithm that an AI can run to do the same.
As for the second problem, one idea that may not have occurred to you is that an AI could write a future version of itself, bug-check and test out various subsystems and perhaps even the entire thing on a virtual machine first, and then shut itself down and start up the successor. If there’s a way for Lenat to see that EURISKO isn’t working properly and then fix it, then there’s a way for AI (version N) to see that AI (version N+1) isn’t working properly and fix it before making the change-over.
In those posts you are arguing something different from what I was talking about. Sure chimps will never make better technology than humans, but sometimes making more advanced clever technology is not what you want to do and be positively detrimental to your chances of shaping the world to a desirable state. The arms race for nuclear weapons for example or bio-weapons research.
If humans manage to invent a virus that wipes us out, would you still call that intelligent? If so it is not that sort of intelligence we need to create… we need to create things that win in the end, not have short term wins and then destroy itself.
Super-plagues and other doomsday tools are possible with current technology. Effective countermeasures are not. Ergo, we need more intelligence, ASAP.
“More generally, intelligence isn’t magical: if there’s any way we can tell whether a change in an AGI represents a bug or an improvement, then there’s an algorithm that an AI can run to do the same.”
Except that we don’t—can’t—do it by pure armchair thought, which is what the recursive self-improvement proposal amounts to.
The approach of testing a new version in a sandbox had occurred to me, and I agree it is a very promising one for many things—but recursive self-improvement isn’t among them! Consider, what’s the primary capability for which version N+1 is being tested? Why, the ability to create version N+2… which involves testing N+2… which involves creating N+3… etc.
Again, there’s enough correlation between ability to perform certain tasks that you don’t need an infinite recursion. To test AIv(N+1)‘s ability to program to exact specification, instead of having it program AIv(N+2) have it instead program some other things that AIvN finds difficult (but whose solutions are within AIvN’s power to verify). That we will be applying AIv(N+1)’s precision programming to itself doesn’t mean we can’t test it on non-recursive data first.
ETA: Of course, since we want the end result to be a superintelligence, AIvN might also ask AIv(N+1) for verifiable insight into an array of puzzling questions, some of which AIvN can’t figure out but suspects are tractable with increased intelligence.
If you observed something to work 15 times, how do you know that it’ll work 16th time? You obtain a model of increasing precision with each test, that lets you predict what happens next, on a test you haven’t performed yet. The same way, you can try to predict what happens on the first try, before any observations took place.
Another point is that testing can be a part of the final product: instead of building a working gizmo, you build a generic self-testing adaptive gizmo that finds the right parameters itself, and that is pre-designed to do that in the most optimal way.
Where is the evidence that EURISKO ever accomplished anything? No one but the author has seen the source code.
On the subject of self-improving AI, you say:
Keep the bolded context in mind.
There are large classes of problem for which this is not the case. For example, “make accurate predictions about future sensory data based on past sensory data” relates program to world, but optimizing for this task is a property of the time, memory, and accuracy trade-offs involved. At the very least, your objection fails to apply “even in principle”.
This depends on the definition of “qualitatively improve”. It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand, so I think this too fails the “even in principle” test.
This seems the most reasonable objection of the three. Interestingly enough, Eliezer claims it’s this very difference that makes self-improving AI unique.
I think this also fails to apply universally, for somewhat more subtle reasons involving the nature of software. A self-improving AI possessing precise specifications of its components has no need for a constant lower layer of architecture (except if you mean the hardware, which is a very different question). The Curry-Howard isomorphism offers a proof-of-concept here: An AI composed of precise logical propositions can arbitrarily rewrite its proofs/programs with any other program of an equivalent type. If you can offer a good argument for why this is impossible in principle, I’d be interested in hearing it.
“It seems Eurisko improved itself in important ways that Lenat couldn’t have done by hand”
As far as I can see, the only self improvements it came up with were fairly trivial ones that Lenat could easily have done by hand. Where it came up with important improvements that Lenat wouldn’t have thought of by himself was in the Traveler game—a simple formal puzzle, fully captured by a set of rules that were coded into the program, making the result a success in machine learning but not self-improvement.
“The Curry-Howard isomorphism offers a proof-of-concept here”
Indeed this approach shows promise, and is the area I’m currently working on. For example if you can formally specify an indexing algorithm, you are free to look for optimized versions provided you can formally prove they meet the specification. If we can make practical tools based on this idea, we could make software engineering significantly more productive.
But this can only be part of the story. Consider the requirement that a program have an intuitive user interface. We have nothing remotely approaching the ability to formally specify this, nor could an AI ever come up with such by pure introspection because it depends on entities that are not part of the AI. Nor, obviously, is a formal specification of human psychology the kind of thing that would ever be approached accidentally by experimentation with Eurisko-style programs as was the original topic. And if the science of some future century—that would need to be to today’s science as the latter is to witchcraft—ever does manage to accomplish such, why then, that would be the key to enabling the development of provably Friendly AI.
...And applied these improvements to the subsequent modified set of rules. “That was machine learning, not self-improvement” sounds like a fully general counter-argument, especially considering your skepticism toward the very idea of self-improvement. Perhaps you can clarify the distinction?
An AI is allowed to learn from its environment, no one’s claiming it will simply meditate on the nature of being and then take over the universe. That said, this example has nothing to do with UFAI. A paperclip maximizer has no need for an intuitive user interface.
Indeed! Sadly, such a specification is not required to interact with and modify one’s environment. Humans were killing chimpanzees with stone tools long before they even possessed the concept of “psychology”.
“Perhaps you can clarify the distinction?”
I’ll call it self improvement when a substantial, nontrivial body of code is automatically developed, that is applicable to domains other than gameplaying, as opposed to playing a slightly different version of the same game. (Note that it was substantially the same strategy that won the second time as the first time, despite the rules changes.)
“A paperclip maximizer has no need for an intuitive user interface.”
True if you’re talking about something like Galactus that begins the story already possessing the ability to eat planets. However, UFAI believers often talk about paperclip maximizers being able to get past firewalls etc. by verbal persuasion of human operators. That certainly isn’t going to happen without a comprehensive theory of human psychology.
rhollerith:
“Strongly-recursive self-improvement” is a figment of the imagination; among the logical errors involved is confusion between properties of a program and properties of the world.
As for the rest: do you believe humanity can survive permanently as we are now, confined to this planet? If you do, then I will point you to the geological evidence to the contrary. If not, then it follows that without more advanced technology, we are dead. Neither I nor anybody else can know what will be the proximate cause of death for the last individual, or in what century, but certain extinction is certain extinction nonetheless.
Let us briefly review the discussion up to now since many readers use the the comments page which does not provide much context. rwallace has been arguing that AI researchers are too concerned (or will become too concerned) about the existential risk from reimplementing EURISKO and things like that.
You have mentioned two or three times, rwallace, that without more advanced technology, humans will eventually go extinct. (I quote one of those 2 or 3 mentions below.) You mention that to create and to manage that future advanced technology, civilization will need better tools to manage complexity. Well, I see one possible objection to your argument right there, in that better science and better technology might well decrease the complexity of the cultural information humans are required to keep on top of. Consider that once Newton gave our civilization a correct theory of dynamics, almost all of the books written before Newton on dynamics could safely be thrown away (the exceptions being books by Descartes and Galileo that help people understand Newton and put Newton in historical context) which of course constitutes a net reduction in the complexity of the cultural information that our civilization has to keep on top of. (If it does not seem like a reduction, that is because the possession of Newtonian dynamical theory made our civilization more ambitious about what goals to try for.)
But please explain to me what your argument has to do with EURISKO and things like that: is it your position that the complexity of future human culture can be managed only with better AGI software?
And do you maintain that that software cannot be developed fast enough by AGI researchers such as Eliezer who are being very careful about existential risks?
In general, the things you argue are dangerous are slow dangers. You yourself refer to “geological evidence” which suggests that they are dangerous on geological timescales.
In contrast, research into certain areas of AI seems to me genuinely fast dangers: things with a high probability of wiping out our civilization in the next 30, 50 or 100 years. It seems unwise to increase fast dangers to decrease slow dangers. But I suppose you disagree that AGI research if not done very carefully is a fast danger. (I’m still studying your arguments on that.)
Reduction in complexity is at least conceivable, I’ll grant. For example if someone invented a zero point energy generator with the cost and form factor of an AA battery, much of the complexity associated with the energy industry could disappear.
But this seems highly unlikely. All the current evidence suggests the contrary: the breakthroughs that will be necessary and possible in the coming century will be precisely those of complex systems (in both senses of the term).
Human level AGI in the near future is indeed neither necessary nor possible. But there is a vast gap between that and what we have today, and we will, yes, need to fill some of that gap. Perhaps a key breakthrough would have come from a young researcher who would have re-implemented Eurisko and from the experiment acquired a critical jump in understanding—and who has now quietly left, thinking Eurisko might blow up the world, to reconsider that job offer from Electronic Arts.
I do disagree that AGI research is a fast danger. I will grant you that there is a sense in which the dangers I am worried about are slow ones - barring unlikely events like a large asteroid impact (which is likely only over longer time scales), I’m confident humanity will still exist 100 years from now.
But our window of opportunity may not. Consider that civilizations are mortal, for reasons unrelated to this conversation. Consider that environments conducive to scientific progress are even considerably rarer and more transient than civilization itself. Consider also that the environment in which our civilization arose is gone, and is not coming back. (For the simplest example, while fossil fuels still exist, the easily accessible deposits thereof, so important for bootstrapping, are largely gone.) I think it quite possible that the 21st-century may be the last hard step in the Great Filter, that by the year 2100 the ultimate fate of humanity may in fact have been decided, even if nobody on that date yet knows it. I cannot of course be certain of this, but I think it likely enough that we cannot afford to risk wasting this window of opportunity.
One problem with this argument is how conjunctive it is: “(A) Progress crucially depends on breakthroughs in complexity management and (B) strong recursive self-improvement is impossible and (C) near-future human level AGI is neither dangerous nor possible but (D) someone working on it is crucial for said complexity management breakthroughs and (E) they’re dissuaded by friendliness concerns and (F) our scientific window of opportunity is small.”
My back-of-the-envelope, generous probabilities:
A. 0.5, this is a pretty strong requirement.
B. 0.9, for simplicity, giving your speculation the benefit of the doubt.
C. 0.9, same.
D. 0.1, a genuine problem of this magnitude is going to attract a lot of diverse talent.
E. 0.01, this is the most demanding element of the scenario, that the UFAI meme itself will crucially disrupt progress.
F. 0.05, this would represent a large break from our current form of steady scientific progress, and I haven’t yet seen much evidence that it’s terribly likely.
That product comes out to roughly 1:50,000. I’m guessing you think the actual figure is higher, and expect you’ll contest those specific numbers, but would you agree that I’ve fairly characterized the structure of your objection to FAI?