From a short read, capabilities seem equal to gpt4. Alpha code 2 is also not penalized for its first 9 submissions, so I struggle to see how it can be compared to humans.
What led you to the “equal” conclusion over the “modest advance” hypothesis? The “beat gpt-4 by a small numerical ratio on all tasks but 1, and is natively multimodal” is what I read from the report.
That leads me to “modest advance”, how did you interpret the report? Are you thinking the margins between the 2 models are too narrow and easily gamed?
Yes those margins are narrow and probably gamed. GPT4’s paper is from the base version and it has probably received modest capabilities upgrades since. Gemini also uses more advanced prompting tactics.
What do you think the compute investment was? They state they used multimodal inputs (more available tokens in the world) and 4096 processor tpuv5 nodes, but not how many or for how long.
From a short read, capabilities seem equal to gpt4. Alpha code 2 is also not penalized for its first 9 submissions, so I struggle to see how it can be compared to humans.
What led you to the “equal” conclusion over the “modest advance” hypothesis? The “beat gpt-4 by a small numerical ratio on all tasks but 1, and is natively multimodal” is what I read from the report.
That leads me to “modest advance”, how did you interpret the report? Are you thinking the margins between the 2 models are too narrow and easily gamed?
Yes those margins are narrow and probably gamed. GPT4’s paper is from the base version and it has probably received modest capabilities upgrades since. Gemini also uses more advanced prompting tactics.
What do you think the compute investment was? They state they used multimodal inputs (more available tokens in the world) and 4096 processor tpuv5 nodes, but not how many or for how long.