Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 31 Jan 2025 4:38 UTC
4 points
0
If pretraining from human preferences works, why hasn’t there been follow up work?
Also, why can’t this be combined with the deepseek paradigm?
https://arxiv.org/abs/2302.08582