Zach Stein-Perlman comments on Zach Stein-Perlman’s Shortform

Zach Stein-Perlman 20 Aug 2024 23:06 UTC
2 points
0
Thanks. Is this because of posttraining? Ignoring posttraining, I’d rather that evaluators get the 90% through training model version and are unrushed than the final version and are rushed — takes?
- Tao Lin 21 Aug 2024 19:31 UTC
  3 points
  0
  Parent
  two versions with the same posttraining, one with only 90% pretraining are indeed very similar, no need to evaluate both. It’s likely more like one model with 80% pretraining and 70% posttraining of the final model, and the last 30% of posttraining might be significant