Oliver Daniels comments on How might we safely pass the buck to AI?

Oliver Daniels 23 Feb 2025 14:53 UTC
1 point
0
Instead I would describe the problem as arising from a generator and verifier mismatch: when the generator is much stronger than the verifier, the verifier is incentivized to fool the verifier without completing the task successfully.
I think these are related but separate problems—even with a perfect verifier (on easy domains), scheming could still arise.
Though imperfect verifiers increase P(scheming), better verifiers increase the domain of “easy” tasks, etc.