Thanks. Is this because of posttraining? Ignoring posttraining, I’d rather that evaluators get the 90% through training model version and are unrushed than the final version and are rushed — takes?
two versions with the same posttraining, one with only 90% pretraining are indeed very similar, no need to evaluate both. It’s likely more like one model with 80% pretraining and 70% posttraining of the final model, and the last 30% of posttraining might be significant
Thanks. Is this because of posttraining? Ignoring posttraining, I’d rather that evaluators get the 90% through training model version and are unrushed than the final version and are rushed — takes?
two versions with the same posttraining, one with only 90% pretraining are indeed very similar, no need to evaluate both. It’s likely more like one model with 80% pretraining and 70% posttraining of the final model, and the last 30% of posttraining might be significant