habryka comments on Cole Wyeth’s Shortform

habryka 9 Nov 2024 18:36 UTC
12 points
7
Huh o1 and the latest Claude were quite huge advances to me. Basically within the last year LLMs for coding went to “occasionally helpful, maybe like a 5-10% productivity improvement” to “my job now is basically to instruct LLMs to do things, depending on the task a 30% to 2x productivity improvement”.
- Cole Wyeth 9 Nov 2024 19:19 UTC
  1 point
  0
  Parent
  I’m in Canada so can’t access the latest Claude, so my experience with these things does tend to be a couple months out of date. But I’m not really impressed with models spitting out slightly wrong code that tells me what functions to call. I think this is essentially a more useful search engine.
  - Vladimir_Nesov 9 Nov 2024 19:38 UTC
    5 points
    3
    Parent
    
    I’m in Canada so can’t access the latest Claude
    
    Use Chatbot Arena, both versions of Claude 3.5 Sonnet are accessible in Direct Chat (third tab). There’s even o1-preview in Battle Mode (first tab), you just need to keep asking the question until you get o1-preview. In general Battle Mode (for a fixed question you keep asking for multiple rounds) is a great tool for developing intuition about model capabilities, since it also hides the model name from you while you are evaluating the response.
  - core_admiral 9 Nov 2024 22:04 UTC
    3 points
    0
    Parent
    Just an FYI unrelated to the discussion—all versions of Claude are available in Canada through Anthropic, you don’t even need third party services like Poe anymore.
    Source: https://www.anthropic.com/news/introducing-claude-to-canada