Understanding the formal definitions involved is not enough. Humans have to be smart enough to independently verify that they map to the actual implementation.
Going up a meta-level doesn’t simplify the problem, in this case—the intelligence capability required to verify the proof is the same as the order of magnitude of intelligence in the AI.
I believe that, in this case, “dumb” is fully general. No human-understandable proof checkers would be powerful enough to reliably check the AI’s proof.
Understanding the formal definitions involved is not enough. Humans have to be smart enough to independently verify that they map to the actual implementation.
This is basically what I mean by “understanding” them. Otherwise, what’s to understand? Would you claim that you “understand set theory” because you’ve memorized the axioms of ZFC?
I believe that, in this case, “dumb” is fully general. No human-understandable proof checkers would be powerful enough to reliably check the AI’s proof.
This intuition is very alien to me. Can you explain why you believe this? Proof checkers built up from relatively simple trusted kernels can verify extremely large and complex proofs. Since the AI’s goal is for the human to understand the proof, it seems more like a test of the AI’s ability to compile proofs down to easily machine-checkable forms than it is the human’s ability to understand the originals. Understanding the definitions is the hard part.
This intuition is very alien to me. Can you explain why you believe this? Proof checkers built up from relatively simple trusted kernels can verify extremely large and complex proofs. Since the AI’s goal is for the human to understand the proof, it seems more like a test of the AI’s ability to compile proofs down to easily machine-checkable forms than it is the human’s ability to understand the originals. Understanding the definitions is the hard part.
A different way to think about this that might help you see the problem from my point of view, is to think of proof checkers as checking the validity of proofs within a given margin of error, and within a range of (implicit) assumptions. How accurate does a proof checker have to be—how far do you have to mess with bult in assumptions for proof checkers (or any human-built tool) before they can no longer be thought of as valid or relevant? If you assume a machine which doubles both its complexity and its understanding of the universe at sub-millisecond intervals, how long before it will find the bugs in any proof checker you will pit it against?
“If” is the question, not “how long”. And I think we’d stand a pretty good chance of handling a proof object in a secure way, assuming we have a secure digital transmission channel etc.
But the original scope of the thought experiment was assuming that we want to verify the proof. Wei Dai said:
Surely most humans would be too dumb to understand such a proof? And even if you could understand it, how does the AI convince you that it doesn’t contain a deliberate flaw that you aren’t smart enough to find? Or even better, you can just refuse to look at the proof.
I was responding to the first question, exclusively disjoint from the others. If your point is that we shouldn’t attempt to verify an AI’s precommitment proof, I agree.
I’m getting more confused. To me, the statements “Humans are too dumb to understand the proof” and the statement “Humans can understand the proof given unlimited time”, where ‘understand’ is qualified to include the ability to properly map the proof to the AI’s capabilities, are equivalent.
My point is not that we shouldn’t attempt to verify the AI’s proof for any external reasons—my point is that there is no useful information to be gained from the attempt.
Understanding the formal definitions involved is not enough. Humans have to be smart enough to independently verify that they map to the actual implementation.
Going up a meta-level doesn’t simplify the problem, in this case—the intelligence capability required to verify the proof is the same as the order of magnitude of intelligence in the AI.
I believe that, in this case, “dumb” is fully general. No human-understandable proof checkers would be powerful enough to reliably check the AI’s proof.
This is basically what I mean by “understanding” them. Otherwise, what’s to understand? Would you claim that you “understand set theory” because you’ve memorized the axioms of ZFC?
This intuition is very alien to me. Can you explain why you believe this? Proof checkers built up from relatively simple trusted kernels can verify extremely large and complex proofs. Since the AI’s goal is for the human to understand the proof, it seems more like a test of the AI’s ability to compile proofs down to easily machine-checkable forms than it is the human’s ability to understand the originals. Understanding the definitions is the hard part.
A different way to think about this that might help you see the problem from my point of view, is to think of proof checkers as checking the validity of proofs within a given margin of error, and within a range of (implicit) assumptions. How accurate does a proof checker have to be—how far do you have to mess with bult in assumptions for proof checkers (or any human-built tool) before they can no longer be thought of as valid or relevant? If you assume a machine which doubles both its complexity and its understanding of the universe at sub-millisecond intervals, how long before it will find the bugs in any proof checker you will pit it against?
“If” is the question, not “how long”. And I think we’d stand a pretty good chance of handling a proof object in a secure way, assuming we have a secure digital transmission channel etc.
But the original scope of the thought experiment was assuming that we want to verify the proof. Wei Dai said:
I was responding to the first question, exclusively disjoint from the others. If your point is that we shouldn’t attempt to verify an AI’s precommitment proof, I agree.
I’m getting more confused. To me, the statements “Humans are too dumb to understand the proof” and the statement “Humans can understand the proof given unlimited time”, where ‘understand’ is qualified to include the ability to properly map the proof to the AI’s capabilities, are equivalent.
My point is not that we shouldn’t attempt to verify the AI’s proof for any external reasons—my point is that there is no useful information to be gained from the attempt.