If pretraining from human preferences works, why hasn’t there been follow up work?
Also, why can’t this be combined with the deepseek paradigm?
https://arxiv.org/abs/2302.08582
If pretraining from human preferences works, why hasn’t there been follow up work?
Also, why can’t this be combined with the deepseek paradigm?
https://arxiv.org/abs/2302.08582