Huh o1 and the latest Claude were quite huge advances to me. Basically within the last year LLMs for coding went to “occasionally helpful, maybe like a 5-10% productivity improvement” to “my job now is basically to instruct LLMs to do things, depending on the task a 30% to 2x productivity improvement”.
I’m in Canada so can’t access the latest Claude, so my experience with these things does tend to be a couple months out of date. But I’m not really impressed with models spitting out slightly wrong code that tells me what functions to call. I think this is essentially a more useful search engine.
Use Chatbot Arena, both versions of Claude 3.5 Sonnet are accessible in Direct Chat (third tab). There’s even o1-preview in Battle Mode (first tab), you just need to keep asking the question until you get o1-preview. In general Battle Mode (for a fixed question you keep asking for multiple rounds) is a great tool for developing intuition about model capabilities, since it also hides the model name from you while you are evaluating the response.
Just an FYI unrelated to the discussion—all versions of Claude are available in Canada through Anthropic, you don’t even need third party services like Poe anymore.
Huh o1 and the latest Claude were quite huge advances to me. Basically within the last year LLMs for coding went to “occasionally helpful, maybe like a 5-10% productivity improvement” to “my job now is basically to instruct LLMs to do things, depending on the task a 30% to 2x productivity improvement”.
I’m in Canada so can’t access the latest Claude, so my experience with these things does tend to be a couple months out of date. But I’m not really impressed with models spitting out slightly wrong code that tells me what functions to call. I think this is essentially a more useful search engine.
Use Chatbot Arena, both versions of Claude 3.5 Sonnet are accessible in Direct Chat (third tab). There’s even o1-preview in Battle Mode (first tab), you just need to keep asking the question until you get o1-preview. In general Battle Mode (for a fixed question you keep asking for multiple rounds) is a great tool for developing intuition about model capabilities, since it also hides the model name from you while you are evaluating the response.
Just an FYI unrelated to the discussion—all versions of Claude are available in Canada through Anthropic, you don’t even need third party services like Poe anymore.
Source: https://www.anthropic.com/news/introducing-claude-to-canada