Thane Ruthenis comments on eggsyntax’s Shortform

Thane Ruthenis 13 Sep 2024 2:43 UTC
7 points
1
it’s easier for the model to learn one consistent set of guidelines for what to say or not say than it is to learn two
The model producing the hidden CoT and the model producing the visible-to-users summary and output might be different models/different late-layer heads/different mixtures of experts.
- eggsyntax 13 Sep 2024 15:51 UTC
  5 points
  0
  Parent
  Oh, that’s an interesting thought, I hadn’t considered that. Different models seems like it would complicate the training process considerably. But different heads/MoE seems like it might be a good strategy that would naturally emerge during training. Great point, thanks.