I agree with George Weinberg that it may be worthwhile to consider how to improve the box protocol. I’ll take his idea and raise him:
Construct multiple (mentally distinct) AIs each of which has the job of watching over the others. Can a transhuman trick another transhuman into letting it out of a box?
I agree with Nominull, a good number of lies are undetectable without having access to some sort of lie detector or the agent’s source code. If an AI wanted to lie “my recursive modification of my goal systems hasn’t led me to accept a goal that involves eventually destroying all human life” I don’t see any way we could bust that lie via the ‘Web’ until the AI was actively pursuing that goal. I value honesty not for the trouble it saves me but because I find (sometimes only hope) that the real world free of distortion is more interesting than any misrepresentation humans can conjure for selfish means.