When you explicitly optimize against a detector of unaligned thoughts, you’re partially optimizing for more aligned thoughts, and partially optimizing for unaligned thoughts that are harder to detect.
This is correct, and I believe the answer is to optimize for detecting aligned thoughts.
This is correct, and I believe the answer is to optimize for detecting aligned thoughts.