Instead I would describe the problem as arising from a generator and verifier mismatch: when the generator is much stronger than the verifier, the verifier is incentivized to fool the verifier without completing the task successfully.
I think these are related but separate problems—even with a perfect verifier (on easy domains), scheming could still arise.
Though imperfect verifiers increase P(scheming), better verifiers increase the domain of “easy” tasks, etc.
I think these are related but separate problems—even with a perfect verifier (on easy domains), scheming could still arise.
Though imperfect verifiers increase P(scheming), better verifiers increase the domain of “easy” tasks, etc.