eggsyntax comments on eggsyntax’s Shortform

eggsyntax 13 Sep 2024 15:34 UTC
1 point
0
Elsewhere @Wei Dai points out the apparent conflict between ‘we cannot train any policy compliance or user preferences onto the chain of thought’ (above) and the following from the Safety section (emphasis mine):
We found that integrating our policies for model behavior into the chain of thought of a reasoning model is an effective way to robustly teach human values and principles. By teaching the model our safety rules and how to reason about them in context...