Oh dear, RL for everything, because surely nobody’s been complaining about the safety profile of doing RL directly on instrumental tasks rather than on goals that benefit humanity.
I agree, after all RLFH was originally for RL agents. As long as the models aren’t all that smart, and the tasks they have to do aren’t all that long-term, the transfer should work great, and the occasional failure won’t be a problem because, again, the models aren’t all that smart.
To be clear, I don’t expect a ‘sharp left turn’ so much as ‘we always implicitly incentivized exploitation of human foibles, we just always caught it when it mattered, until we didn’t.’
Oh dear, RL for everything, because surely nobody’s been complaining about the safety profile of doing RL directly on instrumental tasks rather than on goals that benefit humanity.
My rather hot take is that a lot of the arguments for safety of LLMs also transfer over to practical RL efforts, with some caveats.
I agree, after all RLFH was originally for RL agents. As long as the models aren’t all that smart, and the tasks they have to do aren’t all that long-term, the transfer should work great, and the occasional failure won’t be a problem because, again, the models aren’t all that smart.
To be clear, I don’t expect a ‘sharp left turn’ so much as ‘we always implicitly incentivized exploitation of human foibles, we just always caught it when it mattered, until we didn’t.’