Before RLHF, it was difficult to obtain a system able to backflip or able to summarize correctly texts. It is better than not being able to do those things at all.
Bad points:
RLHF in the summary paper is better than fine-tuning, but still not perfect
RL makes the system more agentic, and more dangerous We do not have guarantees RLHF will work
Using Mechanical turker won’t scale to superintelligence. It works for backflips, it’s not clear if it will work for “Human values” in general.
We do not have a principle way to describe the “Human values”, a specification or general rules would have been a much better form of specification. But there are theoretical reasons to think that these formal rules do not exist
RLFF requires high quality feedbacks
RLHF is not enough, you have to use modern RL algorithms such as PPO to make it work
But from an engineering viewpoint, if we had to solve the outer problem tomorrow, I think that it would be one of the techniques to use.
Good point for RLHF:
Before RLHF, it was difficult to obtain a system able to backflip or able to summarize correctly texts. It is better than not being able to do those things at all.
Bad points:
RLHF in the summary paper is better than fine-tuning, but still not perfect
RL makes the system more agentic, and more dangerous
We do not have guarantees RLHF will work
Using Mechanical turker won’t scale to superintelligence. It works for backflips, it’s not clear if it will work for “Human values” in general.
We do not have a principle way to describe the “Human values”, a specification or general rules would have been a much better form of specification. But there are theoretical reasons to think that these formal rules do not exist
RLFF requires high quality feedbacks
RLHF is not enough, you have to use modern RL algorithms such as PPO to make it work
But from an engineering viewpoint, if we had to solve the outer problem tomorrow, I think that it would be one of the techniques to use.