You may be right. Perhaps the way to view this idea is “yet another fuzzy-boundary RL helper technique” that works in a very different way and so will have different strengths and weaknesses than stuff like RLHF. So if one is doing the “serially apply all cheap tricks that somewhat reduce risk” approach then this can be yet another thing in your chain.
You may be right. Perhaps the way to view this idea is “yet another fuzzy-boundary RL helper technique” that works in a very different way and so will have different strengths and weaknesses than stuff like RLHF. So if one is doing the “serially apply all cheap tricks that somewhat reduce risk” approach then this can be yet another thing in your chain.