“Only the result of the verification process will leave the box, and thus we have the same safety properties as before: The system will output a predetermined bit of information or fail to output anything at all. This setup I will call an IP-Oracle AI.”
My understanding of the AI architecture you are proposing is as follows:
1. Humans come up with a mathematical theorem we want to solve. 2. The theorem is submitted to the AI. 3. The AI tries to prove the theorem and submit a correct proof to the theorem checker. 4. The theorem checker outputs 1 if the proof is correct and 0 if it is not.
It’s not clear to me that such a system would be very useful for carrying out a pivotal act and it seems like it would only be useful for checking whether a problem is solvable.
For example, we could create a theorem containing specifications for some nanotechnology system and we might get 1 from the box if the AI has found a correct proof and 0 otherwise. But that doesn’t seem very useful to me because it merely proves that the problem can be solved. The problem could still be extremely hard to solve.
Then in the “Applications to AI alignment section” it says:
“In this case, we could ask the IP-Oracle for designs that satisfy these specifications.”
The system described in this quote seems very different from the one-bit system described earlier because it can output designs and seems more like a classic oracle.
No, that’s not quite right. What you are describing is the NP-Oracle.
On the other hand, with the IP-Oracle we can (in principle, limited by the power of the prover/AI) solve all problems in the PSPACE complexity class.
Of course, PSPACE is again a class of decision problems, but using binary search it’s straightforward to extract complete answers like the designs mentioned later in the article.
My understanding of the AI architecture you are proposing is as follows:
1. Humans come up with a mathematical theorem we want to solve.
2. The theorem is submitted to the AI.
3. The AI tries to prove the theorem and submit a correct proof to the theorem checker.
4. The theorem checker outputs 1 if the proof is correct and 0 if it is not.
It’s not clear to me that such a system would be very useful for carrying out a pivotal act and it seems like it would only be useful for checking whether a problem is solvable.
For example, we could create a theorem containing specifications for some nanotechnology system and we might get 1 from the box if the AI has found a correct proof and 0 otherwise. But that doesn’t seem very useful to me because it merely proves that the problem can be solved. The problem could still be extremely hard to solve.
Then in the “Applications to AI alignment section” it says:
The system described in this quote seems very different from the one-bit system described earlier because it can output designs and seems more like a classic oracle.
No, that’s not quite right. What you are describing is the NP-Oracle.
On the other hand, with the IP-Oracle we can (in principle, limited by the power of the prover/AI) solve all problems in the PSPACE complexity class.
Of course, PSPACE is again a class of decision problems, but using binary search it’s straightforward to extract complete answers like the designs mentioned later in the article.