It turns out that “train the model st CoT is predictable by a different model” is exactly the idea in prover-verifier games. That’s very exciting! Do PVGs reduce introspection or steganography?
It turns out that “train the model st CoT is predictable by a different model” is exactly the idea in prover-verifier games. That’s very exciting! Do PVGs reduce introspection or steganography?