it’s clear that useful intelligence that is incapable of recursive self-improvement is possible… I’m an existence proof, for example.
I wouldn’t be so sure about it. Imagine that you are given unlimited time, perfect health, and you can use as much data storage (paper, computers, etc) as you need. Do you think your self-improvement would stop at some point?
Problem of humans is that they have limited time, much of which is wasted by gathering resources to survive, or by climbing up the social ladder… and then they die, and the next generation starts almost from zero. At least we have culture, education, books and other tools which allow next generation to use a part of the achievements of the previous generations—unfortunately, the learning also takes too much time. We are so limited by our hardware.
Imagine a child growing up. Imagine studying at elementary school, high school, university. Is this an improvement? Yes. Why does it stop? Because we run out of time and resources, our own health and abilities being also a limited resource. However as a species, humans are self-improving. We are just not fast enough to FOOM as individuals (yet).
Supposing all this is true, it nevertheless suggests a path for defining a safe route for research.
As you say, we are limited by our hardware, by our available resources, by the various rate-limiting steps in our self-improvement. There’s nothing magical about these limits; they are subject to study and to analysis. Sufficiently competent analysis could quantify those qualitative limits, could support a claim like “to achieve X level of self-improvement given Y resources would take a mind like mine Z years”. The same kind of analysis could justify similar claims about other sorts of optimizing systems other than my mind.
If I have written the source code for an optimizing system, and such an analysis of the source code concludes that for it to exceed TheOtherDave-2012′s capabilities on some particular reference platform would take no less than 35 minutes, then it seems to follow that I can safely execute that source code on that reference platform for half an hour.
Edit: Or, well, I suppose “safely” is relative; my own existence represents some risk, as I’m probably smart enough to (for example) kill a random AI researcher given the element of surprise should I choose to do so. But the problem of constraining the inimical behavior of human-level intelligences is one we have to solve whether we work on AGI or not.
Different humans can have the same lifespan, yet achieve different results. Think about Michelangelo or da Vinci… and then think about billions of unknown humans who lived the same age (at least thousands of them had the same lifespan and the same resources and opportunities for education). Analogically, half an hour may be not enough for an average self-improving AI, but may be plenty of time for a “Michelangelo of recursive self-improvement”. And before the experiments we don’t even know what is the distribution of such “Michelangelos” in the AI population.
Speaking about a learning curve is just a metaphor that can’t be taken too literally. Learning happens in jumps, some of them smaller, some of them larger, only in average in a long run it can be approximated by a curve. If a Moore’s law say that computers will be twice as fast in 18 months, it does not mean they get exactly 1.04 times faster every month. The improvement comes as a series of discrete steps. Learning to read gives humans ability to learn faster, but we cannot divide the effect and say what is the acceleration per letter of alphabet. And we also cannot look at the history of medical research, and conclude that according to its speed, the cure of cancer will be discovered exactly on the 31st of December 2017, 12:30 PM.
So if you think that to achieve some dangerous ability a system would have to run 35 minutes, I would recommend to run it no longer than 1 minute, preferably less, and then carefully analyze the results. Problem is, we can do this only as long as the AI is less intelligent that us. After this point we get to “AI does something, and we are too stupid to understand what it means, but it seems to work somehow” stage.
Agreed that once a system gets “smarter than us” all bets are off, which suggests that the important threshold for safety considerations is “how long will it take for algorithm X running on reference platform Y to get smarter than me.”
Agreed that if my understanding of the engineering constraints of the system is so unreliable that all I can say is “Well, the average algorithm will take about half an hour to get there on an average platform, but who knows how fast outliers can go?” then I can’t safely run an algorithm for any length of time… I just don’t know enough yet.
We don’t understand the engineering constraints that affect semiconductor development well enough to set a confident limit on how quickly can improve, so all we have is unreliable generalizations like Moore’s Law. We don’t understand the engineering constraints that affect learning in humans even that well. We understand the engineering constraints that affect the development of cancer cures even less well than that.
You’re absolutely right that, in that state of ignorance, we can’t say what’s safe and what’s not.
You seem to be assuming that that state of ignorance is something we can’t do anything about, an inherent limitation of the universe and the human condition… that the engineering constraints affecting the maximum rates of self-optimization of a particular algorithm on a particular platform are and will always be a mystery.
If that’s true, then sure, there’s never a safe threshold we can rely on.
I don’t really see why that would be true, though. It’s a hard problem, certainly, but it’s an engineering problem. If understanding the engineering constraints that govern rates of algorithm self-optimization is possible (that is, if it’s not some kind of ineluctable Mystery) and if that would let us predict reliably the maximum safe running time of a potentially self-optimizing algorithm, it seems like that would be a useful direction for further research.
You seem to be assuming that that state of ignorance is something we can’t do anything about
No, no, no. We probably can do something about it. I just assume that it will be more complicated than “make an estimate that complexity C will take time T, and then run a simulation for time S<T”; especially if we have no clue at all what the word ‘complexity’ means, despite pretending that it is a value we can somehow measure on a linear scale.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding. Only then it makes sense to plan the second step. Or maybe I’m even confused about this all.
The only part I feel sure about is that we should first understand what self-improvement is, and only then we can try to measure it, and only then we can attempt to use some self-improvement treshold as a safety mechanism in an AI simulator.
This is a bit different from other situations, where you can first measure something, and then it is enough time to collect data and develop some understanding. Here a situation where you have something to measure (when there is a self-improving process), it is already an existential risk. If you have to make a map of minefield, you don’t start by walking on the field and stomping heavily, even if in other situation an analogical procedure would be very good.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding.
Yes, absolutely agreed. That’s the place to start. I’m suggesting that doing this would be valuable, because if done properly it might ultimately lead to a point where our understanding is quantified enough that we can make reliable claims about how long we expect a given amount of self-improvement to take for a given algorithm given certain resources.
This is a bit different from other situations, where you can first measure something
Sure, situations where you can safely first measure something are very different from the situations we’re discussing.
I wouldn’t be so sure about it. Imagine that you are given unlimited time, perfect health, and you can use as much data storage (paper, computers, etc) as you need. Do you think your self-improvement would stop at some point?
Problem of humans is that they have limited time, much of which is wasted by gathering resources to survive, or by climbing up the social ladder… and then they die, and the next generation starts almost from zero. At least we have culture, education, books and other tools which allow next generation to use a part of the achievements of the previous generations—unfortunately, the learning also takes too much time. We are so limited by our hardware.
Imagine a child growing up. Imagine studying at elementary school, high school, university. Is this an improvement? Yes. Why does it stop? Because we run out of time and resources, our own health and abilities being also a limited resource. However as a species, humans are self-improving. We are just not fast enough to FOOM as individuals (yet).
Supposing all this is true, it nevertheless suggests a path for defining a safe route for research.
As you say, we are limited by our hardware, by our available resources, by the various rate-limiting steps in our self-improvement.
There’s nothing magical about these limits; they are subject to study and to analysis.
Sufficiently competent analysis could quantify those qualitative limits, could support a claim like “to achieve X level of self-improvement given Y resources would take a mind like mine Z years”. The same kind of analysis could justify similar claims about other sorts of optimizing systems other than my mind.
If I have written the source code for an optimizing system, and such an analysis of the source code concludes that for it to exceed TheOtherDave-2012′s capabilities on some particular reference platform would take no less than 35 minutes, then it seems to follow that I can safely execute that source code on that reference platform for half an hour.
Edit: Or, well, I suppose “safely” is relative; my own existence represents some risk, as I’m probably smart enough to (for example) kill a random AI researcher given the element of surprise should I choose to do so. But the problem of constraining the inimical behavior of human-level intelligences is one we have to solve whether we work on AGI or not.
Different humans can have the same lifespan, yet achieve different results. Think about Michelangelo or da Vinci… and then think about billions of unknown humans who lived the same age (at least thousands of them had the same lifespan and the same resources and opportunities for education). Analogically, half an hour may be not enough for an average self-improving AI, but may be plenty of time for a “Michelangelo of recursive self-improvement”. And before the experiments we don’t even know what is the distribution of such “Michelangelos” in the AI population.
Speaking about a learning curve is just a metaphor that can’t be taken too literally. Learning happens in jumps, some of them smaller, some of them larger, only in average in a long run it can be approximated by a curve. If a Moore’s law say that computers will be twice as fast in 18 months, it does not mean they get exactly 1.04 times faster every month. The improvement comes as a series of discrete steps. Learning to read gives humans ability to learn faster, but we cannot divide the effect and say what is the acceleration per letter of alphabet. And we also cannot look at the history of medical research, and conclude that according to its speed, the cure of cancer will be discovered exactly on the 31st of December 2017, 12:30 PM.
So if you think that to achieve some dangerous ability a system would have to run 35 minutes, I would recommend to run it no longer than 1 minute, preferably less, and then carefully analyze the results. Problem is, we can do this only as long as the AI is less intelligent that us. After this point we get to “AI does something, and we are too stupid to understand what it means, but it seems to work somehow” stage.
Agreed that once a system gets “smarter than us” all bets are off, which suggests that the important threshold for safety considerations is “how long will it take for algorithm X running on reference platform Y to get smarter than me.”
Agreed that if my understanding of the engineering constraints of the system is so unreliable that all I can say is “Well, the average algorithm will take about half an hour to get there on an average platform, but who knows how fast outliers can go?” then I can’t safely run an algorithm for any length of time… I just don’t know enough yet.
We don’t understand the engineering constraints that affect semiconductor development well enough to set a confident limit on how quickly can improve, so all we have is unreliable generalizations like Moore’s Law.
We don’t understand the engineering constraints that affect learning in humans even that well.
We understand the engineering constraints that affect the development of cancer cures even less well than that.
You’re absolutely right that, in that state of ignorance, we can’t say what’s safe and what’s not.
You seem to be assuming that that state of ignorance is something we can’t do anything about, an inherent limitation of the universe and the human condition… that the engineering constraints affecting the maximum rates of self-optimization of a particular algorithm on a particular platform are and will always be a mystery.
If that’s true, then sure, there’s never a safe threshold we can rely on.
I don’t really see why that would be true, though. It’s a hard problem, certainly, but it’s an engineering problem. If understanding the engineering constraints that govern rates of algorithm self-optimization is possible (that is, if it’s not some kind of ineluctable Mystery) and if that would let us predict reliably the maximum safe running time of a potentially self-optimizing algorithm, it seems like that would be a useful direction for further research.
No, no, no. We probably can do something about it. I just assume that it will be more complicated than “make an estimate that complexity C will take time T, and then run a simulation for time S<T”; especially if we have no clue at all what the word ‘complexity’ means, despite pretending that it is a value we can somehow measure on a linear scale.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding. Only then it makes sense to plan the second step. Or maybe I’m even confused about this all.
The only part I feel sure about is that we should first understand what self-improvement is, and only then we can try to measure it, and only then we can attempt to use some self-improvement treshold as a safety mechanism in an AI simulator.
This is a bit different from other situations, where you can first measure something, and then it is enough time to collect data and develop some understanding. Here a situation where you have something to measure (when there is a self-improving process), it is already an existential risk. If you have to make a map of minefield, you don’t start by walking on the field and stomping heavily, even if in other situation an analogical procedure would be very good.
Yes, absolutely agreed. That’s the place to start. I’m suggesting that doing this would be valuable, because if done properly it might ultimately lead to a point where our understanding is quantified enough that we can make reliable claims about how long we expect a given amount of self-improvement to take for a given algorithm given certain resources.
Sure, situations where you can safely first measure something are very different from the situations we’re discussing.