By “refining pure human feedback”, do you mean refining RLHF ML techniques?
I assume you still view enhancing human feedback as valuable? And also more straightforwardly just increasing the quality of the best human feedback?
I mean things like tricks to improve the sample efficiency of human feedback, doing more projects that are un-enhanced RLHF to learn things about how un-enhanced RLHF works, etc.
By “refining pure human feedback”, do you mean refining RLHF ML techniques?
I assume you still view enhancing human feedback as valuable? And also more straightforwardly just increasing the quality of the best human feedback?
I mean things like tricks to improve the sample efficiency of human feedback, doing more projects that are un-enhanced RLHF to learn things about how un-enhanced RLHF works, etc.