In the case of ‘scaffolded’ LLMs where much of the high-level ‘system 2’ behaviour is actually hardcoded in terms of the outer loop that humans (and the AI can program) we may be able to get faster RSI than the underlying DL model, because iterating on software is quick and large neural networks systems are much more like hardware than software. On the other hand, if the AGI model is some end-to-end RL trained network, then RSI will require training up successors to itself which will come at a substantial compute and time cost. This will slow down the RSI iteration time and ultimately reduce the speed of takeoff to a slow one.
It seems that you tend to overlook the pillar of intelligence that lies in the culture (theories, curriculum). RSI may happen through rapid, synergistic scientific progress that gets fleshed out in papers and textbooks which in turn are fed in the context of LLMs to generate further science, models, and progress. The “exemplary actor” architecture highlights this aspect of intelligence, strongly relying on textbooks, while the “scaffolded” algorithms of it are simple.
To the question of how quick cultural evolution could be, given the (presently existing) sample-inefficiency of LLM pre-training, LLMs could circumvent this time-bottleneck in various ways, from Toolformer-style fine-tuning to generation of large amounts of simulations and reasoning with new models and theories, then pruning the theories that don’t stand the test of simulations, then pre-training subsequent LLMs on these swaths of simulated data. Even if there is insufficient feedback with the real world, the ensuing RSI dynamic just in the digital space of simulations could be bizarre. People are increasingly sucked into this space, too, with VR and metaverse.
It seems that you tend to overlook the pillar of intelligence that lies in the culture (theories, curriculum). RSI may happen through rapid, synergistic scientific progress that gets fleshed out in papers and textbooks which in turn are fed in the context of LLMs to generate further science, models, and progress. The “exemplary actor” architecture highlights this aspect of intelligence, strongly relying on textbooks, while the “scaffolded” algorithms of it are simple.
We have also started to see more “automated DL science” work this year (https://arxiv.org/abs/2302.00615, https://arxiv.org/abs/2305.19525, https://arxiv.org/abs/2307.05432), independent of scaffolded LLMs, which can further help the “curriculum/theoretic/cultural” path towards RSI.
To the question of how quick cultural evolution could be, given the (presently existing) sample-inefficiency of LLM pre-training, LLMs could circumvent this time-bottleneck in various ways, from Toolformer-style fine-tuning to generation of large amounts of simulations and reasoning with new models and theories, then pruning the theories that don’t stand the test of simulations, then pre-training subsequent LLMs on these swaths of simulated data. Even if there is insufficient feedback with the real world, the ensuing RSI dynamic just in the digital space of simulations could be bizarre. People are increasingly sucked into this space, too, with VR and metaverse.