Nathan Helm-Burger comments on Claude Sonnet 3.5.1 and Haiku 3.5

Nathan Helm-Burger 24 Oct 2024 16:05 UTC
2 points
0
Having an expensive 3.5 Opus would be cool, but it’s not my top wish. I’d prefer to have a variety of “flavors” of Sonnet. Different specializations for different use cases.

For example:

Science fiction writer / General Creative writer

Poet

Actor

Philosopher / Humanities professor

Chem/Bio professor

Math/Physics professor

Literary Editor

Coder

Lawyer/Political Science professor

Clerical worker for mundane repetitive tasks (probably should be a Haiku, actually)

The main things missing from Sonnet 3.5 that Opus 3 has are creativity, open mindedness, ability to analyze multi-sided complex philosophical questions better, ability to roleplay convincingly.

Why try to cram all abilities into one single model? Distilling down to smaller models seems like a perfect place to allow for specialization.

[Edit: less than a month later, my wish came true. Anthropic has added “communication styles” to Claude and I really like it. The built-in ones (concise, formal) work great. The roll-your-own-from-examples is rough around the edges still.]
- Archimedes 25 Oct 2024 3:09 UTC
  1 point
  0
  Parent
  I suspect fine-tuning specialized models is just squeezing a bit more performance in a particular direction, and not nearly as useful as developing the next-gen model. Complex reasoning takes more steps and tighter coherence among them (the o1 models are a step in this direction). You can try to devote a toddler to studying philosophy, but it won’t really work until their brain matures more.
  - Nathan Helm-Burger 25 Oct 2024 5:03 UTC
    2 points
    0
    Parent
    For raw IQ, sure. I just mean “conversational flavor”.
    - Ann 25 Oct 2024 13:12 UTC
      3 points
      0
      Parent
      If system prompts aren’t enough but fine-tuning is, this should be doable with different adapters that can be loaded at inference time; not needing to distill into separate models.
      - Nathan Helm-Burger 25 Oct 2024 17:04 UTC
        2 points
        0
        Parent
        Yes, I agree that’s an alternative. Then you’d need the primary model to be less RLHF’d and focused. A more raw model should be capable, with an adapter, of expressing a wider variety of behaviors.
        
        I still think that distilling down from specialized large teacher models world likely give the best result, but that’s just a hunch.