As one further data point, I also heard people close to/working at Anthropic giving “We won’t advance the state of the art.”-type statements, though I never asked about specifics.
My sense is also that Claude 3 Opus is only slightly better than the best published GPT-4. To add one data point: I happen to work on a benchmark right now and on that benchmark, Opus is only very slightly better than gpt-4-1106. (See my X/Twitter post for detailed results.) So, I agree with LawrenceC’s comment that they’re arguably not significantly advancing the state of the art.
I suppose even if Opus is only slightly better (or even just perceived to be better) and even if we all expect OpenAI to release a better GPT-4.5 soon, Anthropic could still take a bunch of OpenAI’s GPT-4 business with this. (I’ll probably switch from ChatGPT-4 to Claude, for instance.) So it’s not that hard to imagine an internal OpenAI email saying, “Okay, folks, let’s move a bit faster with these top-tier models from now on, lest too many people switch to Claude.” I suppose that would already be quite worrying to people here. (Whereas, people would probably worry less if Anthropic took some of OpenAI’s business by having models that are slightly worse but cheaper or more aligned/less likely to say things you wouldn’t want models to say in production.)
After having spent a few hours playing with Opus, I think “slightly better than best public gpt-4” seems qualitatively correct—both models tend to get tripped up on the same kinds of tasks, but Opus can inconsistently solve some tasks in my workflow that gpt-4 cannot.
And yeah, it seems likely that I will also swap to Claude over ChatGPT.
As one further data point, I also heard people close to/working at Anthropic giving “We won’t advance the state of the art.”-type statements, though I never asked about specifics.
My sense is also that Claude 3 Opus is only slightly better than the best published GPT-4. To add one data point: I happen to work on a benchmark right now and on that benchmark, Opus is only very slightly better than gpt-4-1106. (See my X/Twitter post for detailed results.) So, I agree with LawrenceC’s comment that they’re arguably not significantly advancing the state of the art.
I suppose even if Opus is only slightly better (or even just perceived to be better) and even if we all expect OpenAI to release a better GPT-4.5 soon, Anthropic could still take a bunch of OpenAI’s GPT-4 business with this. (I’ll probably switch from ChatGPT-4 to Claude, for instance.) So it’s not that hard to imagine an internal OpenAI email saying, “Okay, folks, let’s move a bit faster with these top-tier models from now on, lest too many people switch to Claude.” I suppose that would already be quite worrying to people here. (Whereas, people would probably worry less if Anthropic took some of OpenAI’s business by having models that are slightly worse but cheaper or more aligned/less likely to say things you wouldn’t want models to say in production.)
After having spent a few hours playing with Opus, I think “slightly better than best public gpt-4” seems qualitatively correct—both models tend to get tripped up on the same kinds of tasks, but Opus can inconsistently solve some tasks in my workflow that gpt-4 cannot.
And yeah, it seems likely that I will also swap to Claude over ChatGPT.