Can’t CoT’s be what makes RL safe, however? (if you force the reasoner to self-limit under some recursion depth when it senses that the RL agent might be asking for so much that it makes it unsafe)
Can’t CoT’s be what makes RL safe, however? (if you force the reasoner to self-limit under some recursion depth when it senses that the RL agent might be asking for so much that it makes it unsafe)