Yes, I agree that’s an alternative. Then you’d need the primary model to be less RLHF’d and focused. A more raw model should be capable, with an adapter, of expressing a wider variety of behaviors.
I still think that distilling down from specialized large teacher models world likely give the best result, but that’s just a hunch.
Yes, I agree that’s an alternative. Then you’d need the primary model to be less RLHF’d and focused. A more raw model should be capable, with an adapter, of expressing a wider variety of behaviors.
I still think that distilling down from specialized large teacher models world likely give the best result, but that’s just a hunch.