What do you mean by “chain-of-thought interpretability tools”?
I’m guessing it’s something along these lines.
What do you mean by “chain-of-thought interpretability tools”?
I’m guessing it’s something along these lines.