Different humans can have the same lifespan, yet achieve different results. Think about Michelangelo or da Vinci… and then think about billions of unknown humans who lived the same age (at least thousands of them had the same lifespan and the same resources and opportunities for education). Analogically, half an hour may be not enough for an average self-improving AI, but may be plenty of time for a “Michelangelo of recursive self-improvement”. And before the experiments we don’t even know what is the distribution of such “Michelangelos” in the AI population.
Speaking about a learning curve is just a metaphor that can’t be taken too literally. Learning happens in jumps, some of them smaller, some of them larger, only in average in a long run it can be approximated by a curve. If a Moore’s law say that computers will be twice as fast in 18 months, it does not mean they get exactly 1.04 times faster every month. The improvement comes as a series of discrete steps. Learning to read gives humans ability to learn faster, but we cannot divide the effect and say what is the acceleration per letter of alphabet. And we also cannot look at the history of medical research, and conclude that according to its speed, the cure of cancer will be discovered exactly on the 31st of December 2017, 12:30 PM.
So if you think that to achieve some dangerous ability a system would have to run 35 minutes, I would recommend to run it no longer than 1 minute, preferably less, and then carefully analyze the results. Problem is, we can do this only as long as the AI is less intelligent that us. After this point we get to “AI does something, and we are too stupid to understand what it means, but it seems to work somehow” stage.
Agreed that once a system gets “smarter than us” all bets are off, which suggests that the important threshold for safety considerations is “how long will it take for algorithm X running on reference platform Y to get smarter than me.”
Agreed that if my understanding of the engineering constraints of the system is so unreliable that all I can say is “Well, the average algorithm will take about half an hour to get there on an average platform, but who knows how fast outliers can go?” then I can’t safely run an algorithm for any length of time… I just don’t know enough yet.
We don’t understand the engineering constraints that affect semiconductor development well enough to set a confident limit on how quickly can improve, so all we have is unreliable generalizations like Moore’s Law. We don’t understand the engineering constraints that affect learning in humans even that well. We understand the engineering constraints that affect the development of cancer cures even less well than that.
You’re absolutely right that, in that state of ignorance, we can’t say what’s safe and what’s not.
You seem to be assuming that that state of ignorance is something we can’t do anything about, an inherent limitation of the universe and the human condition… that the engineering constraints affecting the maximum rates of self-optimization of a particular algorithm on a particular platform are and will always be a mystery.
If that’s true, then sure, there’s never a safe threshold we can rely on.
I don’t really see why that would be true, though. It’s a hard problem, certainly, but it’s an engineering problem. If understanding the engineering constraints that govern rates of algorithm self-optimization is possible (that is, if it’s not some kind of ineluctable Mystery) and if that would let us predict reliably the maximum safe running time of a potentially self-optimizing algorithm, it seems like that would be a useful direction for further research.
You seem to be assuming that that state of ignorance is something we can’t do anything about
No, no, no. We probably can do something about it. I just assume that it will be more complicated than “make an estimate that complexity C will take time T, and then run a simulation for time S<T”; especially if we have no clue at all what the word ‘complexity’ means, despite pretending that it is a value we can somehow measure on a linear scale.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding. Only then it makes sense to plan the second step. Or maybe I’m even confused about this all.
The only part I feel sure about is that we should first understand what self-improvement is, and only then we can try to measure it, and only then we can attempt to use some self-improvement treshold as a safety mechanism in an AI simulator.
This is a bit different from other situations, where you can first measure something, and then it is enough time to collect data and develop some understanding. Here a situation where you have something to measure (when there is a self-improving process), it is already an existential risk. If you have to make a map of minefield, you don’t start by walking on the field and stomping heavily, even if in other situation an analogical procedure would be very good.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding.
Yes, absolutely agreed. That’s the place to start. I’m suggesting that doing this would be valuable, because if done properly it might ultimately lead to a point where our understanding is quantified enough that we can make reliable claims about how long we expect a given amount of self-improvement to take for a given algorithm given certain resources.
This is a bit different from other situations, where you can first measure something
Sure, situations where you can safely first measure something are very different from the situations we’re discussing.
Different humans can have the same lifespan, yet achieve different results. Think about Michelangelo or da Vinci… and then think about billions of unknown humans who lived the same age (at least thousands of them had the same lifespan and the same resources and opportunities for education). Analogically, half an hour may be not enough for an average self-improving AI, but may be plenty of time for a “Michelangelo of recursive self-improvement”. And before the experiments we don’t even know what is the distribution of such “Michelangelos” in the AI population.
Speaking about a learning curve is just a metaphor that can’t be taken too literally. Learning happens in jumps, some of them smaller, some of them larger, only in average in a long run it can be approximated by a curve. If a Moore’s law say that computers will be twice as fast in 18 months, it does not mean they get exactly 1.04 times faster every month. The improvement comes as a series of discrete steps. Learning to read gives humans ability to learn faster, but we cannot divide the effect and say what is the acceleration per letter of alphabet. And we also cannot look at the history of medical research, and conclude that according to its speed, the cure of cancer will be discovered exactly on the 31st of December 2017, 12:30 PM.
So if you think that to achieve some dangerous ability a system would have to run 35 minutes, I would recommend to run it no longer than 1 minute, preferably less, and then carefully analyze the results. Problem is, we can do this only as long as the AI is less intelligent that us. After this point we get to “AI does something, and we are too stupid to understand what it means, but it seems to work somehow” stage.
Agreed that once a system gets “smarter than us” all bets are off, which suggests that the important threshold for safety considerations is “how long will it take for algorithm X running on reference platform Y to get smarter than me.”
Agreed that if my understanding of the engineering constraints of the system is so unreliable that all I can say is “Well, the average algorithm will take about half an hour to get there on an average platform, but who knows how fast outliers can go?” then I can’t safely run an algorithm for any length of time… I just don’t know enough yet.
We don’t understand the engineering constraints that affect semiconductor development well enough to set a confident limit on how quickly can improve, so all we have is unreliable generalizations like Moore’s Law.
We don’t understand the engineering constraints that affect learning in humans even that well.
We understand the engineering constraints that affect the development of cancer cures even less well than that.
You’re absolutely right that, in that state of ignorance, we can’t say what’s safe and what’s not.
You seem to be assuming that that state of ignorance is something we can’t do anything about, an inherent limitation of the universe and the human condition… that the engineering constraints affecting the maximum rates of self-optimization of a particular algorithm on a particular platform are and will always be a mystery.
If that’s true, then sure, there’s never a safe threshold we can rely on.
I don’t really see why that would be true, though. It’s a hard problem, certainly, but it’s an engineering problem. If understanding the engineering constraints that govern rates of algorithm self-optimization is possible (that is, if it’s not some kind of ineluctable Mystery) and if that would let us predict reliably the maximum safe running time of a potentially self-optimizing algorithm, it seems like that would be a useful direction for further research.
No, no, no. We probably can do something about it. I just assume that it will be more complicated than “make an estimate that complexity C will take time T, and then run a simulation for time S<T”; especially if we have no clue at all what the word ‘complexity’ means, despite pretending that it is a value we can somehow measure on a linear scale.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding. Only then it makes sense to plan the second step. Or maybe I’m even confused about this all.
The only part I feel sure about is that we should first understand what self-improvement is, and only then we can try to measure it, and only then we can attempt to use some self-improvement treshold as a safety mechanism in an AI simulator.
This is a bit different from other situations, where you can first measure something, and then it is enough time to collect data and develop some understanding. Here a situation where you have something to measure (when there is a self-improving process), it is already an existential risk. If you have to make a map of minefield, you don’t start by walking on the field and stomping heavily, even if in other situation an analogical procedure would be very good.
Yes, absolutely agreed. That’s the place to start. I’m suggesting that doing this would be valuable, because if done properly it might ultimately lead to a point where our understanding is quantified enough that we can make reliable claims about how long we expect a given amount of self-improvement to take for a given algorithm given certain resources.
Sure, situations where you can safely first measure something are very different from the situations we’re discussing.