My notes from the discussion are reproduced below:
We liked the article quite a lot. There was a surprising number of new insights for an article purporting to just collect standard arguments.
The definition of fast takeoff seemed somewhat non-standard, conflating 3 things: Speed as measured in clock-time, continuity/smoothness around the threshold where AGI reaches human baseline, and locality. These 3 questions are closely related, but not identical, and some precision would be appreciated. In fairness, the article was posted on Paul Christianos “popular” blog, not his “formal” blog.
The degree to which we can build universal / general AIs right now was a point of contention. Our (limited) understanding is that most AI researchers would disagree with Paul Christianos about whether we can build a universal or general AI right now. Paul Christianos argument seem to rest on our ability to trade off universality against other factors, but if (as we believe) universality is still mysterious, this tradeoff is not possible.
There was some confusion about the relationship between “Universality” and “Generality”. Possibly, a “village idiot” is above the level of generality (passes Turing test, can make coffee) whereas he would not be at the “Universality” level (unable to self-improve to Superintelligence, even given infinite time). It is unclear if Paul Christiano would agree to this.
The comparison between Humans and Chimpanzees was discussed, and related to the argument from Human Variation, which seems to be stronger. The difference between a village idiot and Einstein is also large, and the counter-argument about what evolution cares about seem to not hold here.
Paul Christiano asked for a canonical example of a key insight enabling an unsolvable problem to be solved. An example would be my Matrix Multiplication example (https://youtu.be/5DDdBHsDI-Y). Here, a series of 4 key insights turn the problem from requiring a decade, to a year, to a day, to a second. While the example is not canonical nor precisely what Paul Christiano asks for, it does point to a way get intution about the “key insight”: Grab a paper and a pen, and try to do matrix multiplication faster than O(n^3). It is possible, but far from trivial.
For the deployment lag (“Sonic Boom”) argument, a factor that can complicate the tradeoff is “secrecy”. If deployment cause you to lose the advantages of secrecy, the tradeoffs described could look much worse.
A number of the arguments for a fast takeoff did seem to aggregate, in one specific way: If our prior is for a “quite fast” takeoff, the arguments push us towards expecting a “very fast” takeoff. This is my personal interpretation, and I have not really formalized it. I should get around to that some day.
An example would be my Matrix Multiplication example (https://youtu.be/5DDdBHsDI-Y). Here, a series of 4 key insights turn the problem from requiring a decade, to a year, to a day, to a second.
In fact Strassen’s algorithm is worse than textbook matrix multiplication for most reasonably sized matrices, including all matrices that could be multiplied in the 70s. Even many decades later the gains are still pretty small (and it’s only worth doing for unusually giant matrix multiplies). As far as I am aware nothing more complicated than Strassen’s algorithm is ever used in practice. So it doesn’t seem like an example of a key insight enabling a problem to be solved.
We could imagine an alternate reality in which large matrix multiplications became possible only after we discovered Strassen’s algorithm. But I think there is a reason that reality is alternate.
Overall I think difficult theory and clever insights are sometimes critical, perhaps often enough to more than justify our society’s tiny investment in them, but it’s worth having a sense of how exceptional these cases are.
In my video, I describe one of the breakthroughs in matrix multplication after Strassen as “Efficient parallelization, like MapReduce, in the nineties”. This insight is used in practice, though some of the other improvements I mention are not practical.
In the section “Finding the secret sauce”, you asked for a canonical historical example of an insight having immediate dramatic effects. The canonical example is “nuclear weapons”, but this does not seem to precisely satisfy your requirements. While this example is commonly used, I’m not too fond of it, which is why I substituted my own.
My video “If AGI was Matrix Multiplication” does not claim that that fast matrix multiplication is a particular impressive intellectual breakthrough. It is a moderate improvement, but I show that such moderate improvement are sufficient to trigger an intelligence explosion.
If we wish to predict the trajectory of improvements to the first AGI algorithm (hypothetically), we might choose as reference class “Trajectories of improvements to all problems”. With this reference class, it looks like most improvement happens slowly, continuously and with a greater emphasis on experience rather than insights.
We might instead choose the reference class “Trajectories of improvement to algorithms”, which is far narrower, but still rich in examples. Here a book on the history of algorithms will provide many examples of improvements due to difficult theory and clever insights, with matrix multiplication not standing out as particular impressive. Presumably, most of these trajectories are sufficient for an intelligence explosion, if the trajectory were to be followed by the first AGI algorithm. However, a history book is a highly biased view of the past, as it will tend to focus on the most impressive trajectories. I am unsure about how to overcome this problem.
An even narrower reference class would be “Trajectories of improvement to AI algorithms”, where training artificial neural networks is an example of a trajectory that would surely be explosive. I intuitively feel that this reference class is too narrow, as the first AGI algorithm could be substantially different from previous AI algorithms.
The AISafety.com Reading Group discussed this article when it was published. My slides are here: https://www.dropbox.com/s/t0k6wn4q90emwf2/Takeoff_Speeds.pptx?dl=0
There is a recording of my presentation here: https://youtu.be/7ogJuXNmAIw
My notes from the discussion are reproduced below:
We liked the article quite a lot. There was a surprising number of new insights for an article purporting to just collect standard arguments.
The definition of fast takeoff seemed somewhat non-standard, conflating 3 things: Speed as measured in clock-time, continuity/smoothness around the threshold where AGI reaches human baseline, and locality. These 3 questions are closely related, but not identical, and some precision would be appreciated. In fairness, the article was posted on Paul Christianos “popular” blog, not his “formal” blog.
The degree to which we can build universal / general AIs right now was a point of contention. Our (limited) understanding is that most AI researchers would disagree with Paul Christianos about whether we can build a universal or general AI right now. Paul Christianos argument seem to rest on our ability to trade off universality against other factors, but if (as we believe) universality is still mysterious, this tradeoff is not possible.
There was some confusion about the relationship between “Universality” and “Generality”. Possibly, a “village idiot” is above the level of generality (passes Turing test, can make coffee) whereas he would not be at the “Universality” level (unable to self-improve to Superintelligence, even given infinite time). It is unclear if Paul Christiano would agree to this.
The comparison between Humans and Chimpanzees was discussed, and related to the argument from Human Variation, which seems to be stronger. The difference between a village idiot and Einstein is also large, and the counter-argument about what evolution cares about seem to not hold here.
Paul Christiano asked for a canonical example of a key insight enabling an unsolvable problem to be solved. An example would be my Matrix Multiplication example (https://youtu.be/5DDdBHsDI-Y). Here, a series of 4 key insights turn the problem from requiring a decade, to a year, to a day, to a second. While the example is not canonical nor precisely what Paul Christiano asks for, it does point to a way get intution about the “key insight”: Grab a paper and a pen, and try to do matrix multiplication faster than O(n^3). It is possible, but far from trivial.
For the deployment lag (“Sonic Boom”) argument, a factor that can complicate the tradeoff is “secrecy”. If deployment cause you to lose the advantages of secrecy, the tradeoffs described could look much worse.
A number of the arguments for a fast takeoff did seem to aggregate, in one specific way: If our prior is for a “quite fast” takeoff, the arguments push us towards expecting a “very fast” takeoff. This is my personal interpretation, and I have not really formalized it. I should get around to that some day.
In fact Strassen’s algorithm is worse than textbook matrix multiplication for most reasonably sized matrices, including all matrices that could be multiplied in the 70s. Even many decades later the gains are still pretty small (and it’s only worth doing for unusually giant matrix multiplies). As far as I am aware nothing more complicated than Strassen’s algorithm is ever used in practice. So it doesn’t seem like an example of a key insight enabling a problem to be solved.
We could imagine an alternate reality in which large matrix multiplications became possible only after we discovered Strassen’s algorithm. But I think there is a reason that reality is alternate.
Overall I think difficult theory and clever insights are sometimes critical, perhaps often enough to more than justify our society’s tiny investment in them, but it’s worth having a sense of how exceptional these cases are.
Wikipedia claims that “it is faster in cases where n > 100 or so” https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm
The introduction of this Wikipedia article seems to describe these improvements as practically useful.
In my video, I describe one of the breakthroughs in matrix multplication after Strassen as “Efficient parallelization, like MapReduce, in the nineties”. This insight is used in practice, though some of the other improvements I mention are not practical.
In the section “Finding the secret sauce”, you asked for a canonical historical example of an insight having immediate dramatic effects. The canonical example is “nuclear weapons”, but this does not seem to precisely satisfy your requirements. While this example is commonly used, I’m not too fond of it, which is why I substituted my own.
My video “If AGI was Matrix Multiplication” does not claim that that fast matrix multiplication is a particular impressive intellectual breakthrough. It is a moderate improvement, but I show that such moderate improvement are sufficient to trigger an intelligence explosion.
If we wish to predict the trajectory of improvements to the first AGI algorithm (hypothetically), we might choose as reference class “Trajectories of improvements to all problems”. With this reference class, it looks like most improvement happens slowly, continuously and with a greater emphasis on experience rather than insights.
We might instead choose the reference class “Trajectories of improvement to algorithms”, which is far narrower, but still rich in examples. Here a book on the history of algorithms will provide many examples of improvements due to difficult theory and clever insights, with matrix multiplication not standing out as particular impressive. Presumably, most of these trajectories are sufficient for an intelligence explosion, if the trajectory were to be followed by the first AGI algorithm. However, a history book is a highly biased view of the past, as it will tend to focus on the most impressive trajectories. I am unsure about how to overcome this problem.
An even narrower reference class would be “Trajectories of improvement to AI algorithms”, where training artificial neural networks is an example of a trajectory that would surely be explosive. I intuitively feel that this reference class is too narrow, as the first AGI algorithm could be substantially different from previous AI algorithms.