Assuming we require a performance of 40 tokens/s, the training cluster can run 200030×24000=1,600,000 concurrent instances of the resulting 70B model
Nit: you mixed up 30 and 40 here (should both be 30 or both be 40).
I will assume that the above ratios hold for an AGI level model.
If you train a model with 10x as many parameters, but use the same training data, then it will cost 10x as much to train and 10x as much to operate, so the ratios will hold.
In practice, I believe it is universal to use more training data when training larger models? Implying that the ratio would actually increase (which further supports your thesis).
On the other hand, the world already contains over 8 billion human intelligences. So I think you are assuming that a few million AGIs, possibly running at several times human speed (and able to work 24⁄7, exchange information electronically, etc.), will be able to significantly “outcompete” (in some fashion) 8 billion humans? This seems worth further exploration / justification.
Can you elaborate? This might be true but I don’t think it’s self-evidently obvious.
In fact it could in some ways be a disadvantage; as Cole Wyeth notes in a separate top-level comment, “There are probably substantial gains from diversity among humans”. 1.6 million identical twins might all share certain weaknesses or blind spots.
The main advantage is that you can immediately distribute fine-tunes to all of the copies. This is much higher bandwidth compared to our own low-bandwidth/high-effort knowledge dissemination methods.
The monolithic aspect may potentially be a disadvantage, but there are a couple of mitigations:
AGI are by definition generalists
you can segment the population into specialists (see also this comment about MoE)
On the other hand, the world already contains over 8 billion human intelligences. So I think you are assuming that a few million AGIs, possibly running at several times human speed (and able to work 24⁄7, exchange information electronically, etc.), will be able to significantly “outcompete” (in some fashion) 8 billion humans? This seems worth further exploration / justification.
Good point, but a couple of thoughts:
the operational definition of AGI referred in the article is significantly stronger than the average human
the humans are poorly organized
the 8 billion humans are supporting a civilization, while the AGIs can focus on AI research and self-improvement
All of this is plausible, but I’d encourage you to go through the exercise of working out these ideas in more detail. It’d be interesting reading and you might encounter some surprises / discover some things along the way.
Note, for example, that the AGIs would be unlikely to focus on AI research and self-improvement if there were more economically valuable things for them to be doing, and if (very plausibly!) there were not more economically valuable things for them to be doing, why wouldn’t a big chunk of the 8 billion humans have been working on AI research already (such that an additional 1.6 million agents working on this might not be an immediate game changer)? There might be good arguments to be made that the AGIs would make an important difference, but I think it’s worth spelling them out.
Nit: you mixed up 30 and 40 here (should both be 30 or both be 40).
If you train a model with 10x as many parameters, but use the same training data, then it will cost 10x as much to train and 10x as much to operate, so the ratios will hold.
In practice, I believe it is universal to use more training data when training larger models? Implying that the ratio would actually increase (which further supports your thesis).
On the other hand, the world already contains over 8 billion human intelligences. So I think you are assuming that a few million AGIs, possibly running at several times human speed (and able to work 24⁄7, exchange information electronically, etc.), will be able to significantly “outcompete” (in some fashion) 8 billion humans? This seems worth further exploration / justification.
Having 1.6 million identical twins seems like a pretty huge advantage though.
Can you elaborate? This might be true but I don’t think it’s self-evidently obvious.
In fact it could in some ways be a disadvantage; as Cole Wyeth notes in a separate top-level comment, “There are probably substantial gains from diversity among humans”. 1.6 million identical twins might all share certain weaknesses or blind spots.
The main advantage is that you can immediately distribute fine-tunes to all of the copies. This is much higher bandwidth compared to our own low-bandwidth/high-effort knowledge dissemination methods.
The monolithic aspect may potentially be a disadvantage, but there are a couple of mitigations:
AGI are by definition generalists
you can segment the population into specialists (see also this comment about MoE)
Good point, but a couple of thoughts:
the operational definition of AGI referred in the article is significantly stronger than the average human
the humans are poorly organized
the 8 billion humans are supporting a civilization, while the AGIs can focus on AI research and self-improvement
All of this is plausible, but I’d encourage you to go through the exercise of working out these ideas in more detail. It’d be interesting reading and you might encounter some surprises / discover some things along the way.
Note, for example, that the AGIs would be unlikely to focus on AI research and self-improvement if there were more economically valuable things for them to be doing, and if (very plausibly!) there were not more economically valuable things for them to be doing, why wouldn’t a big chunk of the 8 billion humans have been working on AI research already (such that an additional 1.6 million agents working on this might not be an immediate game changer)? There might be good arguments to be made that the AGIs would make an important difference, but I think it’s worth spelling them out.