I won’t comment on debate more generally, but can provide a counterexample for your specific proposal:
The predicter learns to deceive in a way that is undetectable to the observations the reporter has access to. As a result, the reporter gives an unconvincing argument, and the deception is missed. If it can already do this for the observations the humans have access to, it should also be able to do so for those the reporter has access to.
I won’t comment on debate more generally, but can provide a counterexample for your specific proposal:
The predicter learns to deceive in a way that is undetectable to the observations the reporter has access to. As a result, the reporter gives an unconvincing argument, and the deception is missed. If it can already do this for the observations the humans have access to, it should also be able to do so for those the reporter has access to.