The concerns about data filtering raised in that post’s comments[1] suggest doing aligned-CoT-seeding on the pretraining data may be a better thing to try instead.
ex.: Jozdien citing gwern
The concerns about data filtering raised in that post’s comments[1] suggest doing aligned-CoT-seeding on the pretraining data may be a better thing to try instead.
ex.: Jozdien citing gwern