Agree that the cited links don’t represent a strong criticism of RLHF but I think there’s an interesting implied criticism, between the mode-collapse post and janus’ other writings on cyborgism etc that I haven’t seen spelled out, though it may well be somewhere.
I see janus as saying that if you know how to properly use the raw models, then you can actually get much more useful work out of the raw models than the RLHF’d ones. If true, we’re paying a significant alignment tax with RLHF that will only become clear with the improvement and take-up of wrappers around base models in the vein of Loom.
I guess the test (best done without too much fanfare) would be to get a few people well acquainted with Loom or whichever wrapper tool and identify a few complex tasks and see whether the base model or the RLHF model performs better.
Even if true though, I don’t think it’s really a mark against RLHF since it’s still likely that RLHF makes outputs safer for the vast majority of users, just that if we think we’re in an ideas arms-race with people trying to advance capabilities, we can’t expect everyone to be using RLHF’d models.
Agree that the cited links don’t represent a strong criticism of RLHF but I think there’s an interesting implied criticism, between the mode-collapse post and janus’ other writings on cyborgism etc that I haven’t seen spelled out, though it may well be somewhere.
I see janus as saying that if you know how to properly use the raw models, then you can actually get much more useful work out of the raw models than the RLHF’d ones. If true, we’re paying a significant alignment tax with RLHF that will only become clear with the improvement and take-up of wrappers around base models in the vein of Loom.
I guess the test (best done without too much fanfare) would be to get a few people well acquainted with Loom or whichever wrapper tool and identify a few complex tasks and see whether the base model or the RLHF model performs better.
Even if true though, I don’t think it’s really a mark against RLHF since it’s still likely that RLHF makes outputs safer for the vast majority of users, just that if we think we’re in an ideas arms-race with people trying to advance capabilities, we can’t expect everyone to be using RLHF’d models.