That’s what I meant by “base model”, one that is only trained on next token prediction. Do I have the wrong terminology?
Nope, you’re right, I was reading quickly & didn’t parse that :)
That’s what I meant by “base model”, one that is only trained on next token prediction. Do I have the wrong terminology?
Nope, you’re right, I was reading quickly & didn’t parse that :)