The development of transformative AI may involve a feedback loop in which we train ML models that help us train better ML models and so on (e.g. using approaches like neural architecture search which seems to be getting increasingly popular in recent years). There is nothing equivalent to such a feedback loop in biological evolution (animals don’t use their problem-solving capabilities to make evolution more efficient). Does your analysis assume there won’t be such a feedback loop (or at least not one that has a large influence on timelines)? Consider adding to the report a discussion about this topic (sorry if it’s already there and I missed it).
Part of the Neural Network hypothesis is the proposition that “a transformative model would perform roughly as many FLOP / subj sec as the human brain”. It seems to me worthwhile to investigate this proposition further. Human evolution corresponds to a search over a tiny subset of all possible computing machines. Why should we expect that a different search algorithm over an entirely different subset of computing machines would yield systems (with certain capabilities) that use a similar amount of compute? One might pursue an empirical approach for investigating this topic, e.g. by comparing two algorithms for searching over a space of models, where one is some common supervised learning algorithm, and the other is some evolutionary computation algorithm.
In a separate comment (under this one) I attempt to describe a more thorough and formal way of thinking about this topic.
Regarding methods that involve adjusting variables according to properties of 2020 algorithms (or the models trained by them): It would be interesting to try to apply the same methods with respect to earlier points in time (e.g. as if you were writing the report back in 1998/2012/2015 when LeNet-5/AlexNet/DQN were introduced, respectively). To what extent would the results be consistent with the 2020 analysis?
Let O1 and O2 be two optimization algorithms, each searching over some set of programs. Let V be some evaluation metric over programs such that V(p) is our evaluation of program p, for the purpose of comparing a program found by O1 to a program found by O2. For example, V can be defined as a subjective impressiveness metric as judged by a human.
Intuitive definition: Suppose we plot a curve for each optimization algorithm such that the x-axis is the inference compute of a yielded program and the y-axis is our evaluation value of that program. If the curves of O1 and O2 are similar up to scaling along the x-axis, then we say that O1 and O2 are similarly-scaling w.r.t inference compute, or SSIC for short.
Formal definition: Let O1 and O2 be optimization algorithms and let V be an evaluation function over programs. Let us denote with Oi(n) the program that Oi finds when it uses n flops (which would correspond to the training compute if Oi is an ML algorithms). Let us denote with C(p) the amount of compute that program p uses. We say that O1 and O2 are SSIC with respect to V if for any n1,n′1,n2,n′2 such that C(O1(n1))C(O2(n2))≈C(O1(n′1))C(O2(n′2)), if V(O1(n1))≈V(O2(n2)) then V(O1(n′1))≈V(O2(n′2)).
I think the report draft implicitly uses the assumption that human evolution and the first ML algorithm that will result in TAI are SSIC (with respect to a relevant V). It may be beneficial to discuss this assumption in the report. Clearly, not all pairs of optimization algorithms are SSIC (e.g. consider a pure random search + any optimization algorithm). Under what conditions should we expect a pair of optimization algorithms to be SSIC with respect to a given V?
Maybe that question should be investigated empirically, by looking at pairs of optimization algorithms, were one is a popular ML algorithm and the other is some evolutionary computation algorithm (searching over a very different model space), and checking to what extent the two algorithms are SSIC.
Some thoughts:
The development of transformative AI may involve a feedback loop in which we train ML models that help us train better ML models and so on (e.g. using approaches like neural architecture search which seems to be getting increasingly popular in recent years). There is nothing equivalent to such a feedback loop in biological evolution (animals don’t use their problem-solving capabilities to make evolution more efficient). Does your analysis assume there won’t be such a feedback loop (or at least not one that has a large influence on timelines)? Consider adding to the report a discussion about this topic (sorry if it’s already there and I missed it).
Part of the Neural Network hypothesis is the proposition that “a transformative model would perform roughly as many FLOP / subj sec as the human brain”. It seems to me worthwhile to investigate this proposition further. Human evolution corresponds to a search over a tiny subset of all possible computing machines. Why should we expect that a different search algorithm over an entirely different subset of computing machines would yield systems (with certain capabilities) that use a similar amount of compute? One might pursue an empirical approach for investigating this topic, e.g. by comparing two algorithms for searching over a space of models, where one is some common supervised learning algorithm, and the other is some evolutionary computation algorithm.
In a separate comment (under this one) I attempt to describe a more thorough and formal way of thinking about this topic.
Regarding methods that involve adjusting variables according to properties of 2020 algorithms (or the models trained by them): It would be interesting to try to apply the same methods with respect to earlier points in time (e.g. as if you were writing the report back in 1998/2012/2015 when LeNet-5/AlexNet/DQN were introduced, respectively). To what extent would the results be consistent with the 2020 analysis?
Let O1 and O2 be two optimization algorithms, each searching over some set of programs. Let V be some evaluation metric over programs such that V(p) is our evaluation of program p, for the purpose of comparing a program found by O1 to a program found by O2. For example, V can be defined as a subjective impressiveness metric as judged by a human.
Intuitive definition: Suppose we plot a curve for each optimization algorithm such that the x-axis is the inference compute of a yielded program and the y-axis is our evaluation value of that program. If the curves of O1 and O2 are similar up to scaling along the x-axis, then we say that O1 and O2 are similarly-scaling w.r.t inference compute, or SSIC for short.
Formal definition: Let O1 and O2 be optimization algorithms and let V be an evaluation function over programs. Let us denote with Oi(n) the program that Oi finds when it uses n flops (which would correspond to the training compute if Oi is an ML algorithms). Let us denote with C(p) the amount of compute that program p uses. We say that O1 and O2 are SSIC with respect to V if for any n1,n′1,n2,n′2 such that C(O1(n1))C(O2(n2))≈C(O1(n′1))C(O2(n′2)), if V(O1(n1))≈V(O2(n2)) then V(O1(n′1))≈V(O2(n′2)).
I think the report draft implicitly uses the assumption that human evolution and the first ML algorithm that will result in TAI are SSIC (with respect to a relevant V). It may be beneficial to discuss this assumption in the report. Clearly, not all pairs of optimization algorithms are SSIC (e.g. consider a pure random search + any optimization algorithm). Under what conditions should we expect a pair of optimization algorithms to be SSIC with respect to a given V?
Maybe that question should be investigated empirically, by looking at pairs of optimization algorithms, were one is a popular ML algorithm and the other is some evolutionary computation algorithm (searching over a very different model space), and checking to what extent the two algorithms are SSIC.