eggsyntax comments on eggsyntax’s Shortform

eggsyntax 13 Sep 2024 15:51 UTC
5 points
0
Oh, that’s an interesting thought, I hadn’t considered that. Different models seems like it would complicate the training process considerably. But different heads/MoE seems like it might be a good strategy that would naturally emerge during training. Great point, thanks.