Vladimir_Nesov comments on AI #49: Bioweapon Testing Begins

Vladimir_Nesov 1 Feb 2024 17:44 UTC
4 points
0
Bard Gemini Pro (as it’s called in lmsys arena) has access to the web and an unusual finetuning with a hyper-analytical character, it often explicitly formulates multiple subtopics in a reply and looks into each of them separately. In contrast the earlier Gemini Pro entries that are not Bard have a finetuning or prompt not suitable for the arena, often giving a single sentence or even a single word as a first response. Thus like Claude 2 (with its unlikable character) they operate at a handicap relative to base model capabilities. GPT-4 on lmsys arena doesn’t have access to the web, and GPT-4 Turbo’s newer knowledge from 2022-2023 seems more shallow than earlier knowledge, they probably didn’t fully retrain the base model just for this release.

So both kinds of Gemini Pro are bad proxies for placement of their base model on the leaderboard. In particular, if the Bard entry in the arena is in fact Gemini Pro and not Gemini Ultra, then Gemini Ultra with Bard Gemini Pro’s advantages will probably beat the current GPT-4 Turbo (which doesn’t have these advantages) even if Ultra is not smarter that GPT-4.