I agree, after all RLFH was originally for RL agents. As long as the models aren’t all that smart, and the tasks they have to do aren’t all that long-term, the transfer should work great, and the occasional failure won’t be a problem because, again, the models aren’t all that smart.
To be clear, I don’t expect a ‘sharp left turn’ so much as ‘we always implicitly incentivized exploitation of human foibles, we just always caught it when it mattered, until we didn’t.’
I agree, after all RLFH was originally for RL agents. As long as the models aren’t all that smart, and the tasks they have to do aren’t all that long-term, the transfer should work great, and the occasional failure won’t be a problem because, again, the models aren’t all that smart.
To be clear, I don’t expect a ‘sharp left turn’ so much as ‘we always implicitly incentivized exploitation of human foibles, we just always caught it when it mattered, until we didn’t.’