My point is that RLHF incentivizes all sorts of tnings and these things depend on content of trained model, not on what RLHF is.
It depends on both.
My point is that RLHF incentivizes all sorts of tnings and these things depend on content of trained model, not on what RLHF is.
It depends on both.