(1) As Paul noted, the question of the exponent alpha is just the question of diminishing returns vs returns-to-scale.
Especially if you believe that the rate f=f(R) is a product of multiple terms (like e.g. Paul’s suggestion f=Rαt⋅Rαa with one exponent for computer tech advances and another for algorithmic advances) then you get returns-to-scale type dynamics (over certain regimes, i.e. until all fruit are picked) with finite-time blow-up.
(2) Also, an imho crucial aspect is the separation of time-scales between human-driven research and computation done by machines (transistors are faster than neurons and buying more hardware scales better than training a new person up to the bleeding edge of research, especially considering Scott’s amusing parable of the alchemists).
Let’s add a little flourish to your model: You had the rate of research I and the cumulative research R ; let’s give a name C to the capability of the AI system. Then, we can model ∂tR=I=f(R)=g(C)=g(h(R)) . This is your model, just splitting terms into h, which tells us how hard AI progress is, and g which tells us how good we are at producing research.
Now denote by q=q(C) the fraction of work that absolutely has to be done by humans, and by ε the speed-up factor for silicon over biology. Amdahl’s law gives you g(C)=1q(C)+ε(1−q(C))C , or somewhat simplified g(C)≥1q+εC . This predicts a rate of progress that first looks like 1/q , as long as human researcher input is the limiting factor, then becomes 1/(εC) when we have AIs designing AIs (recursive self-improvement, aka explosion), and then probably saturates at something (when the AI approaches optimality).
The crucial argument for fast take-off (as far as I understood it) is that we can expect q(C) to hit q=0 at some cross-over C∗, and we can expect this to happen with a nonzero derivative ∂Cq(C∗)≠0 . This is just the claim that human-level AI is possible, and that the intelligence of the human parts of the AI research project is not sitting at a magical point (aka: this is generic, you would need to fine-tune your model to get something else).
The change of the rate of research output from the 1/q(C) regime to the 1/(εC) regime sure looks like a hard-take-off singularity to me! And I would like to note that the function h , i.e. the hardness AI research and the diminishing-returns vs returns-to-scale debate does not enter this discussion at any point.
In other words: If you model AI research as done by a team of humans and proto-AIs assisting the humans; and if you assert non-fungibility of humans vs proto-AI-assistents (even if you buy a thousand times more hardware, you still need the generally intelligent human researchers for some parts); and if you assert that better proto-AI-assistents can do a larger proportion of the work (at all); and if you assert that computers are faster than humans; then you get a possibly quite wild change at q=0.
I’d like to note that the cross-over is not “human-level AI”, but rather “q≈0” , i.e. an AI that needs (almost) no human assistence to progress the field of AI research.
On the opposing side (that’s what Robin Hanson would probably say) you have the empirical argument that q should decay like a power-law long before we q=0 (“the last 10% take 90% of the work” is a folk formulation for “percentile 90-99 take nine time as much work as percentile 0-89″ aka power law, and is borne out quite well, empirically).
This does not have any impact on whether we cross q=0 with non-vanishing derivative, but would support Paul’s view that the world will be unrecognizably crazy long before q=0 .
PS. I am currently agnostic about the hard vs soft take-off debate. Yeah, I know, cowardly cop-out.
edit: In the above, C kinda encodes how fast / good our AI is and q encodes how general it is compared to humans. All AI singularity stuff tacitly assumes that human intelligence (assisted by stupid proto-AI) is sufficiently general to design an AI that exceeds or matches the generality of human intelligence. I consider this likely. The counterfactual world would have our AI capabilities saturate at some subhuman level for a long time, using terribly bad randomized/evolutionary algorithms, until it either stumbles unto an AI design that has better generality or we suffer unrelated extinction/heat-death. I consider it likely that human intelligence (assisted by proto-AI) is sufficiently general for a take-off. Heat-death is not an exaggeration: Algorithms with exponentially bad run-time are effectively useless.
Conversely, I consider it very well possible that human intelligence is insufficiently general to understand how human intelligence works! (we are really, really bad at understanding evolution/gradient-descent optimized anything, an that’s what we are)
(1) As Paul noted, the question of the exponent alpha is just the question of diminishing returns vs returns-to-scale.
Especially if you believe that the rate f=f(R) is a product of multiple terms (like e.g. Paul’s suggestion f=Rαt⋅Rαa with one exponent for computer tech advances and another for algorithmic advances) then you get returns-to-scale type dynamics (over certain regimes, i.e. until all fruit are picked) with finite-time blow-up.
(2) Also, an imho crucial aspect is the separation of time-scales between human-driven research and computation done by machines (transistors are faster than neurons and buying more hardware scales better than training a new person up to the bleeding edge of research, especially considering Scott’s amusing parable of the alchemists).
Let’s add a little flourish to your model: You had the rate of research I and the cumulative research R ; let’s give a name C to the capability of the AI system. Then, we can model ∂tR=I=f(R)=g(C)=g(h(R)) . This is your model, just splitting terms into h, which tells us how hard AI progress is, and g which tells us how good we are at producing research.
Now denote by q=q(C) the fraction of work that absolutely has to be done by humans, and by ε the speed-up factor for silicon over biology. Amdahl’s law gives you g(C)=1q(C)+ε(1−q(C))C , or somewhat simplified g(C)≥1q+εC . This predicts a rate of progress that first looks like 1/q , as long as human researcher input is the limiting factor, then becomes 1/(εC) when we have AIs designing AIs (recursive self-improvement, aka explosion), and then probably saturates at something (when the AI approaches optimality).
The crucial argument for fast take-off (as far as I understood it) is that we can expect q(C) to hit q=0 at some cross-over C∗, and we can expect this to happen with a nonzero derivative ∂Cq(C∗)≠0 . This is just the claim that human-level AI is possible, and that the intelligence of the human parts of the AI research project is not sitting at a magical point (aka: this is generic, you would need to fine-tune your model to get something else).
The change of the rate of research output from the 1/q(C) regime to the 1/(εC) regime sure looks like a hard-take-off singularity to me! And I would like to note that the function h , i.e. the hardness AI research and the diminishing-returns vs returns-to-scale debate does not enter this discussion at any point.
In other words: If you model AI research as done by a team of humans and proto-AIs assisting the humans; and if you assert non-fungibility of humans vs proto-AI-assistents (even if you buy a thousand times more hardware, you still need the generally intelligent human researchers for some parts); and if you assert that better proto-AI-assistents can do a larger proportion of the work (at all); and if you assert that computers are faster than humans; then you get a possibly quite wild change at q=0.
I’d like to note that the cross-over is not “human-level AI”, but rather “q≈0” , i.e. an AI that needs (almost) no human assistence to progress the field of AI research.
On the opposing side (that’s what Robin Hanson would probably say) you have the empirical argument that q should decay like a power-law long before we q=0 (“the last 10% take 90% of the work” is a folk formulation for “percentile 90-99 take nine time as much work as percentile 0-89″ aka power law, and is borne out quite well, empirically).
This does not have any impact on whether we cross q=0 with non-vanishing derivative, but would support Paul’s view that the world will be unrecognizably crazy long before q=0 .
PS. I am currently agnostic about the hard vs soft take-off debate. Yeah, I know, cowardly cop-out.
edit: In the above, C kinda encodes how fast / good our AI is and q encodes how general it is compared to humans. All AI singularity stuff tacitly assumes that human intelligence (assisted by stupid proto-AI) is sufficiently general to design an AI that exceeds or matches the generality of human intelligence. I consider this likely. The counterfactual world would have our AI capabilities saturate at some subhuman level for a long time, using terribly bad randomized/evolutionary algorithms, until it either stumbles unto an AI design that has better generality or we suffer unrelated extinction/heat-death. I consider it likely that human intelligence (assisted by proto-AI) is sufficiently general for a take-off. Heat-death is not an exaggeration: Algorithms with exponentially bad run-time are effectively useless.
Conversely, I consider it very well possible that human intelligence is insufficiently general to understand how human intelligence works! (we are really, really bad at understanding evolution/gradient-descent optimized anything, an that’s what we are)