I think this only holds if fine tunes are composable [...] you probably can’t take a million independently-fine-tuned models and merge them [...]
The purpose of a fine-tune is to “internalize” some knowledge—either because it is important to have implicit knowledge of it, or because you want to develop a skill.
Although you may have a million instances executing tasks, the knowledge you want to internalize is likely much more sparse. For example, if an instance is tasked with exploring a portion of a search space, and it doesn’t find a solution in that portion, it can just summarize its finding in a few words. There might not even be a reason to internalize this summary—it might be merged with other summaries for a more global view of the search landscape.
So I don’t see the need for millions of fine-tunes. It seems more likely that you’d have periodic fine-tunes to internalize recent progress—maybe once an hour.
The main point is that the single periodic fine-tune can be copied to all instances. This ability to copy the fine-tune is the main advantage of instances being identical clones.
The main advantage is that you can immediately distribute fine-tunes to all of the copies. This is much higher bandwidth compared to our own low-bandwidth/high-effort knowledge dissemination methods.
The monolithic aspect may potentially be a disadvantage, but there are a couple of mitigations:
AGI are by definition generalists
you can segment the population into specialists (see also this comment about MoE)