PaLM used 2.6e24 training FLOP and seemed far below human-level capabilities to me; do you disagree or is this consistent with your model or is this evidence against your model?
Gato seemed overall less capable than a typical lizard and much less capable than a raven to me; do you disagree or is this consistent with your model or is this evidence against your model?
The model predicts that the (hypothetical) intelligence ranking committee would place PaLM above GPT-3 and perhaps comparable to the typical abilities of human linguistic cortex. PaLM seems clearly superior to GPT-3 to me, but evaluating it against human linguistic cortex is more complex, as noted in the article here, LLMs and humans only partially overlap in their training dataset. Without even looking into PaLM in detail, I predict it surpasses typical humans in some linguistic tasks.
I also pretty much flat out disagree about Gato: without looking up it’s training budget at all, I’d rank it closer to a raven, but these comparisons are complex and extremely noisy unless the systems are trained on similar environments and objectives.
I assume by ‘evidence against your model’ - you talking about the optimization power model, and not the later forecast. I’m not yet aware of any other simple model that could compete with the P model for explaining capabilities, and the theoretical justifications are so sound and well understood, that it would take enormous piles of evidence to convince that there was some better model—do you have something in mind?
I suspect you may be misunderstanding how the model works—it predicts only a correlation between the variables, but just predicting even a weak correlation is sufficient for massive posterior probability, because it is so simple and the dataset is so massive: massive and also very noisy.
Also we will have many foundation models trained with compute budgets far beyond the human brain, and most people will agree they are not AGI, as general intelligence also requires sufficiently general architectures, training environments and objectives. As explained in this footnote, each huge model trained with human level compute still only has a small probability of becoming AGI.
(I haven’t read the whole post yet.)
PaLM used 2.6e24 training FLOP and seemed far below human-level capabilities to me; do you disagree or is this consistent with your model or is this evidence against your model?
Gato seemed overall less capable than a typical lizard and much less capable than a raven to me; do you disagree or is this consistent with your model or is this evidence against your model?
The model predicts that the (hypothetical) intelligence ranking committee would place PaLM above GPT-3 and perhaps comparable to the typical abilities of human linguistic cortex. PaLM seems clearly superior to GPT-3 to me, but evaluating it against human linguistic cortex is more complex, as noted in the article here, LLMs and humans only partially overlap in their training dataset. Without even looking into PaLM in detail, I predict it surpasses typical humans in some linguistic tasks.
I also pretty much flat out disagree about Gato: without looking up it’s training budget at all, I’d rank it closer to a raven, but these comparisons are complex and extremely noisy unless the systems are trained on similar environments and objectives.
I assume by ‘evidence against your model’ - you talking about the optimization power model, and not the later forecast. I’m not yet aware of any other simple model that could compete with the P model for explaining capabilities, and the theoretical justifications are so sound and well understood, that it would take enormous piles of evidence to convince that there was some better model—do you have something in mind?
I suspect you may be misunderstanding how the model works—it predicts only a correlation between the variables, but just predicting even a weak correlation is sufficient for massive posterior probability, because it is so simple and the dataset is so massive: massive and also very noisy.
Also we will have many foundation models trained with compute budgets far beyond the human brain, and most people will agree they are not AGI, as general intelligence also requires sufficiently general architectures, training environments and objectives. As explained in this footnote, each huge model trained with human level compute still only has a small probability of becoming AGI.