That’s true, though I do think there are various proxies that make at least the extreme end of this kind of thing for currently deployed models relatively easy to rule out (like the compute-purchase and allocation decisions of major cloud providers who host some of these models, and staff allocation and various other things).
I do think most organizations who claim parity with GPT-4 or Sonnet are almost always overstating things. My experience with 405b suggests it is also not at the level of Claude 3.5 Sonnet, but it does seem to be at the level of the original GPT-4, though I am not confident since I haven’t played around that much with it GPT-4 recently.
That’s true, though I do think there are various proxies that make at least the extreme end of this kind of thing for currently deployed models relatively easy to rule out (like the compute-purchase and allocation decisions of major cloud providers who host some of these models, and staff allocation and various other things).
I do think most organizations who claim parity with GPT-4 or Sonnet are almost always overstating things. My experience with 405b suggests it is also not at the level of Claude 3.5 Sonnet, but it does seem to be at the level of the original GPT-4, though I am not confident since I haven’t played around that much with it GPT-4 recently.