Would someone be able to clarify the difference between the term HFDT as used here and in the original “Takeover” post, and RLHF? My understanding is that HFDT doesn’t assume an RL model.
Would someone be able to clarify the difference between the term HFDT as used here and in the original “Takeover” post, and RLHF?
My understanding is that HFDT doesn’t assume an RL model.