Jürgen Schmidhuber: High for the next few decades, mostly because some of our own work seems to be almost there:
Heh. I sometimes use the word “Schmidhubristic” in conversation with other AI people. I do think he’s a smart guy, but he would probably be taken more seriously if he didn’t make comments like the above.
Although one should presumably be glad that he is giving the info to let you appropriately weigh his claims. I am also reminded of the AGI timelines survey at a past AGI conference, which peaked sharply in the next few decades (the careers of the AI researchers being surveyed) and then fell rapidly. Other conversations with the folk in question make it look like that survey in part reflects people saying “obviously, my approach has a good chance of success, but if I can’t do it then no one can.” Or, alternatively:
It takes some decades to develop a technique to fruition.
I assume that only techniques I am currently aware of will ever exist.
Therefore, in a few decades when current techniques have been developed and shown to succeed or fail either we will have AI or we will not get it for a very long time if ever.
I suspect that these factors lead folk specifically working on AGI to overweight near-term AGI probability and underweight longer-term AGI prospects.
In my experience there’s a positive correlation where the more someone looks into the trends of the AGI literature, the sooner they think it will be, even in cases where they hope it’s a long ways away. Naively, I don’t get the impression that the bias you pointed out is strongly affecting e.g. Legg or Schmidhuber. I got the impression that your distribution has a median later than most AGI folk including those at SIAI (as far as I can tell; I may be wrong about the views of some SIAI people.). Are you very familiar with the AGI literature, or do you believe your naive outside view beats their inside view plus outside view corrections (insofar as anyone knows how to do such corrections)? You’ve put way more thought into Singularity scenarios than most anyone else. To what extent do you think folk like me should update on your beliefs?
1 P(human-level AI by ? (year) | no wars ∧ no natural disasters ∧ beneficially political and economic development) =
10% − 2050
50% − 2150
80% − 2300
My analysis involves units of “fundamental innovation”. A unit of fundamental innovation is a discovery/advance comparable to information theory, Pearlian causality, or the VC-theory. Using this concept, we can estimate the time until AI by 1) estimating the required # of FI units and 2) estimating the rate at which they arrive. I think FIs arrive at about a rate of 1⁄25 years, and if 3-7 FIs are required, this produces an estimate of 2050-2150. Also, I think that after 2150 the rate of FI appearance will be slower, maybe 1⁄50 yrs, so 2300 corresponds to 10 FIs.
P(human extinction | badly done AI) = 40%
I don’t understand the other question well enough to answer it meaningfully. I think it is highly unlikely that an uFAI will be actively malicious.
P(superhuman intelligence within hours | human-level AI on supercomputer with Internet connection) = 0.01%
P(… within days | …) = 0.1%
P(… within years | …) = 3%
I have low estimates for these contingencies because I don’t believe in the equation: capability=intelligence*computing power. Human capability rests on many other components, such as culture, vision, dextrous hands, etc. I’m also not sure the concept “human-level intelligence” is well-defined.
How much money does the SIAI currently (this year) require (to be instrumental in maximizing your personal long-term goals, e.g. survive the Singularity by solving friendly AI), less/no more/little more/much more/vastly more?
I think the phrasing of the question is odd. I have donated a small amount to SIAI, and will probably donate more in the future, especially if they come up with a more concrete action plan. I buy the basic SIAI argument (even if probability of success is low, there is enough at stake to make the question worthwhile), but more importantly, I think there is a good chance that SIAI will come up with something cool, even if it’s not an FAI design. I doubt SIAI could effectively use vastly more money than it currently has.
What existential risk is currently most likely to have the greatest negative impact on your personal long-term goals, under the condition that nothing is done to mitigate the risk?
My personal goals are much more vulnerable to catastrophic risks such as nuclear war or economic collapse. I am perhaps idiosyncratic among LWers in that it is hard for me to worry much more about existential risk than catastrophic risk—that is to say, if N is the population of the world, I am only about 20x more concerned about a risk that might kill N than I am about a risk that might kill N/10.
Can you think of any milestone such that if it were ever reached you would expect human‐level machine intelligence to be developed within five years thereafter?
A computer program that is not explicitly designed to play chess defeats a human chess master.
I just want to register appreciation for this post with more than an upvote. Your “fundamental innovation” units are a very productive concept, and the milestones you offered was vivid, simple, and yet obviously connected to the bigger picture in a very direct way. This gives me the impression of someone who has spent enough time contemplating the issues to have developed a deep network of novel and reasonably well calibrated technical intuitions, and I always like hearing such people’s thoughts :-)
I suspect that I share your concerns about “mere” catastrophic risks that arrive before AGI has been developed and starts to seriously influence the world.
Your post makes me wonder if you’ve thought about the material/causal conditions that give rise to the production of FI units, and whether the rate at which they are being produced has change over historical periods and may be changing even now?
For myself, I don’t think I even know how many units have been produced already, because I’m still discovering things like VC Theory, which I didn’t know about until you just mentioned it. It seems to me that if Shannon, Pearl, and Vapnik count then so should (for example) Kolmogorov and Hutter and probably a number of others… which implies to me that a longer and more careful essay on the subject of FI units would be worth writing.
The more text you produce on the subject of technical expectations for the future where I can read it, the happier I will be :-)
Your post makes me wonder if you’ve thought about the material/causal conditions that give rise to the production of FI units,
One thing to notice is that in many cases it takes a long period of incubation, conceptual reorganization, and sociological diffusion for the full implications of an FI unit to be recognized. For example, Vapnik and Chervonenkis published the first VC-theory work in 1968, but the Support Vector Machine was not discovered until the 90s. Pearl’s book on causality was published in 2000, but the graphical model framework it depends on dates back at least to the 80s and maybe even as far back as the Chow-Liu algorithm published in 1968. The implication is that the roots of the next set of FIs are probably out there right now—it’s just an issue of figuring out which concepts are truly significant.
On the question of milestones, here is one of particular interest to me. A data compressor implicitly contains a statistical model. One can sample from that model by feeding a random sequence of bits to the decoder component. Let’s say we built a specialized compressor for images of the Manhattan streetscape. Now if the compressor is very good, samples from it will be indistinguishable from real images of Manhattan. I think it will be a huge milestone if someone can build a compressor that generates images realistic enough to fool humans—a kind of visual Turing Test. That goal now seems impossibly distant, but it can be approached by a direct procedure: build a large database of streetscape images, and conduct a systematic search the compressor that reduces the database to the shortest possible size. I think the methods required to achieve that would constitute an FI, and if Schmidhuber/Hutter/Legg group can pull that off, I’ll hail them as truly great scientists.
the Support Vector Machine was not discovered until the 90s.
Why not? I’m not familiar with VC-theory, but the basic idea of separating two sets of points with a hyperplane with the maximum margin doesn’t seem that complex. What made this difficult?
Don’t quote me on this, but I believe the key insight is that the complexity of the max margin hyperplane model depends not on the number of dimensions of the feature space (which may be very large) but on the number of data points used to define the hyperplane (the support vectors), and the latter quantity is usually small. Though that realization is intuitively plausible, it required the VC-theory to actually prove.
The second part of this confuses me, standard compression schemes are good by this measure, images compressed by it are still quite accurate. Did you mean that random data uncompressed by the algorithm is indistinguishable from real images of Manhattan?
To sample from a compressor, you generate a sequence of random bits and feed it into the decompressor component. If the compressor is very well-suited to Manhattan images, the output of this process will be synthetic images that resemble the real city images. If you try to sample from a standard image compressor, you will just get a greyish haze.
I call this the veridical simulation principle. It is useful because it allows a researcher to detect the ways in which a model is deficient. If the model doesn’t handle shadows correctly, the researcher will realize this when the sampling process produces an image of a tree that casts no shade.
Why should innovation proceed at a constant rate? As far as I can tell, the number of people thinking seriously about difficult technical problems is increasing exponentially. Accordingly, it looks to me like most important theoretical milestones occurred recently in human history, and I would expect them to be more and more tightly packed.
I don’t know how fast machine learning / AI research output actually increases, but my first guess would be doubling every 15 years or so, since this seems to be the generic rate at which human output has doubled post-industrial revolution. If this is the case, the difficulty of finding a fundamental innovation would also have to double every fifteen years to keep the rate constant (or the quality of the average researcher would have to drop exponentially, which is maybe less coincidental seeming)
The only reason I’d suspect such a coincidence is if I had observed many fundamental innovations equally spaced in time; but I would wager that the reason they look evenly spread in time (in recent history) is that an intuitive estimate for the magnitude of an advance depends on the background quality of research at the time.
Heh. I sometimes use the word “Schmidhubristic” in conversation with other AI people. I do think he’s a smart guy, but he would probably be taken more seriously if he didn’t make comments like the above.
Although one should presumably be glad that he is giving the info to let you appropriately weigh his claims. I am also reminded of the AGI timelines survey at a past AGI conference, which peaked sharply in the next few decades (the careers of the AI researchers being surveyed) and then fell rapidly. Other conversations with the folk in question make it look like that survey in part reflects people saying “obviously, my approach has a good chance of success, but if I can’t do it then no one can.” Or, alternatively:
It takes some decades to develop a technique to fruition.
I assume that only techniques I am currently aware of will ever exist.
Therefore, in a few decades when current techniques have been developed and shown to succeed or fail either we will have AI or we will not get it for a very long time if ever.
I suspect that these factors lead folk specifically working on AGI to overweight near-term AGI probability and underweight longer-term AGI prospects.
In my experience there’s a positive correlation where the more someone looks into the trends of the AGI literature, the sooner they think it will be, even in cases where they hope it’s a long ways away. Naively, I don’t get the impression that the bias you pointed out is strongly affecting e.g. Legg or Schmidhuber. I got the impression that your distribution has a median later than most AGI folk including those at SIAI (as far as I can tell; I may be wrong about the views of some SIAI people.). Are you very familiar with the AGI literature, or do you believe your naive outside view beats their inside view plus outside view corrections (insofar as anyone knows how to do such corrections)? You’ve put way more thought into Singularity scenarios than most anyone else. To what extent do you think folk like me should update on your beliefs?
Hi, I see that you belong to the group of people I am currently writing. Would you be willing to answer these questions?
Sure:
10% − 2050
50% − 2150
80% − 2300
My analysis involves units of “fundamental innovation”. A unit of fundamental innovation is a discovery/advance comparable to information theory, Pearlian causality, or the VC-theory. Using this concept, we can estimate the time until AI by 1) estimating the required # of FI units and 2) estimating the rate at which they arrive. I think FIs arrive at about a rate of 1⁄25 years, and if 3-7 FIs are required, this produces an estimate of 2050-2150. Also, I think that after 2150 the rate of FI appearance will be slower, maybe 1⁄50 yrs, so 2300 corresponds to 10 FIs.
I don’t understand the other question well enough to answer it meaningfully. I think it is highly unlikely that an uFAI will be actively malicious.
P(superhuman intelligence within hours | human-level AI on supercomputer with Internet connection) = 0.01%
P(… within days | …) = 0.1%
P(… within years | …) = 3%
I have low estimates for these contingencies because I don’t believe in the equation: capability=intelligence*computing power. Human capability rests on many other components, such as culture, vision, dextrous hands, etc. I’m also not sure the concept “human-level intelligence” is well-defined.
I think the phrasing of the question is odd. I have donated a small amount to SIAI, and will probably donate more in the future, especially if they come up with a more concrete action plan. I buy the basic SIAI argument (even if probability of success is low, there is enough at stake to make the question worthwhile), but more importantly, I think there is a good chance that SIAI will come up with something cool, even if it’s not an FAI design. I doubt SIAI could effectively use vastly more money than it currently has.
My personal goals are much more vulnerable to catastrophic risks such as nuclear war or economic collapse. I am perhaps idiosyncratic among LWers in that it is hard for me to worry much more about existential risk than catastrophic risk—that is to say, if N is the population of the world, I am only about 20x more concerned about a risk that might kill N than I am about a risk that might kill N/10.
A computer program that is not explicitly designed to play chess defeats a human chess master.
I just want to register appreciation for this post with more than an upvote. Your “fundamental innovation” units are a very productive concept, and the milestones you offered was vivid, simple, and yet obviously connected to the bigger picture in a very direct way. This gives me the impression of someone who has spent enough time contemplating the issues to have developed a deep network of novel and reasonably well calibrated technical intuitions, and I always like hearing such people’s thoughts :-)
I suspect that I share your concerns about “mere” catastrophic risks that arrive before AGI has been developed and starts to seriously influence the world.
Your post makes me wonder if you’ve thought about the material/causal conditions that give rise to the production of FI units, and whether the rate at which they are being produced has change over historical periods and may be changing even now?
For myself, I don’t think I even know how many units have been produced already, because I’m still discovering things like VC Theory, which I didn’t know about until you just mentioned it. It seems to me that if Shannon, Pearl, and Vapnik count then so should (for example) Kolmogorov and Hutter and probably a number of others… which implies to me that a longer and more careful essay on the subject of FI units would be worth writing.
The more text you produce on the subject of technical expectations for the future where I can read it, the happier I will be :-)
One thing to notice is that in many cases it takes a long period of incubation, conceptual reorganization, and sociological diffusion for the full implications of an FI unit to be recognized. For example, Vapnik and Chervonenkis published the first VC-theory work in 1968, but the Support Vector Machine was not discovered until the 90s. Pearl’s book on causality was published in 2000, but the graphical model framework it depends on dates back at least to the 80s and maybe even as far back as the Chow-Liu algorithm published in 1968. The implication is that the roots of the next set of FIs are probably out there right now—it’s just an issue of figuring out which concepts are truly significant.
On the question of milestones, here is one of particular interest to me. A data compressor implicitly contains a statistical model. One can sample from that model by feeding a random sequence of bits to the decoder component. Let’s say we built a specialized compressor for images of the Manhattan streetscape. Now if the compressor is very good, samples from it will be indistinguishable from real images of Manhattan. I think it will be a huge milestone if someone can build a compressor that generates images realistic enough to fool humans—a kind of visual Turing Test. That goal now seems impossibly distant, but it can be approached by a direct procedure: build a large database of streetscape images, and conduct a systematic search the compressor that reduces the database to the shortest possible size. I think the methods required to achieve that would constitute an FI, and if Schmidhuber/Hutter/Legg group can pull that off, I’ll hail them as truly great scientists.
Why not? I’m not familiar with VC-theory, but the basic idea of separating two sets of points with a hyperplane with the maximum margin doesn’t seem that complex. What made this difficult?
Don’t quote me on this, but I believe the key insight is that the complexity of the max margin hyperplane model depends not on the number of dimensions of the feature space (which may be very large) but on the number of data points used to define the hyperplane (the support vectors), and the latter quantity is usually small. Though that realization is intuitively plausible, it required the VC-theory to actually prove.
The second part of this confuses me, standard compression schemes are good by this measure, images compressed by it are still quite accurate. Did you mean that random data uncompressed by the algorithm is indistinguishable from real images of Manhattan?
To sample from a compressor, you generate a sequence of random bits and feed it into the decompressor component. If the compressor is very well-suited to Manhattan images, the output of this process will be synthetic images that resemble the real city images. If you try to sample from a standard image compressor, you will just get a greyish haze.
I call this the veridical simulation principle. It is useful because it allows a researcher to detect the ways in which a model is deficient. If the model doesn’t handle shadows correctly, the researcher will realize this when the sampling process produces an image of a tree that casts no shade.
OK, that makes sense. It’s isomorphic to doing model checking by looking data generated by your model.
Why should innovation proceed at a constant rate? As far as I can tell, the number of people thinking seriously about difficult technical problems is increasing exponentially. Accordingly, it looks to me like most important theoretical milestones occurred recently in human history, and I would expect them to be more and more tightly packed.
I don’t know how fast machine learning / AI research output actually increases, but my first guess would be doubling every 15 years or so, since this seems to be the generic rate at which human output has doubled post-industrial revolution. If this is the case, the difficulty of finding a fundamental innovation would also have to double every fifteen years to keep the rate constant (or the quality of the average researcher would have to drop exponentially, which is maybe less coincidental seeming)
The only reason I’d suspect such a coincidence is if I had observed many fundamental innovations equally spaced in time; but I would wager that the reason they look evenly spread in time (in recent history) is that an intuitive estimate for the magnitude of an advance depends on the background quality of research at the time.