You can think of chain-of-thought interpretability as the combination of process-based methods with adversarial training.
You can think of chain-of-thought interpretability as the combination of process-based methods with adversarial training.