What if its domain is restricted to math and self-modification? Then, if it fooms, it will be a safe math Oracle, possibly even provably safe. Then it would be a huge help in the road to FAI, both directly and as a case study.
It may very well be possible to build such an AI. However there are several issues with it:
The AI can be adapted for other, less restricted, domains if knowledge on how it works spreads. There would be a large incentive to since such an oracle would only be of limited utility.
The AI adds code that will evolve into another AI into it’s output. It’s remotely possible, depending on what kind of problems you have it working on. If you were using it to design more efficient algorithms, in some cases an AI of some form might be the optimal solution.
Even if you 100% trust the AI to provide the optimal output, you can’t trust that the optimal output to the problem you’ve specified is what you actually want.
The AI could self-modify incorrectly and result in unfriendly AI. In order to be provably friendly/restricted, it would have to be 100% certain of any modification. That’s a very tall order, especially in AI where everything has to be approximations or probabilistic.
It might not be as safe as you think it is. The AI runs some code and gets an unexpected result. Possibly because of a bug in the environment itself. Look up how difficult it is to sandbox untrusted code and you will get some appreciation for how a superintelligence could figure a way out of it’s box.
But it can’t do anything with any exploits it finds because it is restricted to hard-coded axioms? Well, maybe. If it’s using probabilities and some form of machine learning, it might be able to learn that “executing this code give me this result” and then learn to take advantage of that. I don’t believe that a system can work only in formal proofs. However I might be completely wrong about this one, it’s just a thought.
The AI can be adapted for other, less restricted, domains
That the ideas from a safe AI can be used to build an unsafe AI is a general argument against working on (or even talking about) any kind of AI whatsoever.
The AI adds code that will evolve into another AI into it’s output
The output is to contain only proofs of theorems. Specifically, a proof (or refutation) of the theorem in the input. The state of the system is to be reset after each run so as to not accumulate information.
The AI could self-modify incorrectly and result in unfriendly AI
Any correct or incorrect self-modification is still restricted to the math domain, and so cannot result in an unsafe AI.
bug in the environment itself
Guarding against software bugs is easy in this case. You design an abstract virtual machine environment for the AI, then design the software that implements this environment, then formally prove that the software is correct.
Guarding against errors caused by cosmic rays is also easy. You estimate the probability of such an error, and then add redundancy until the probability is so low that it won’t happen until the end of the universe.
Look up how difficult it is to sandbox untrusted code
Sandboxing untrusted code is easy. The difficult thing is sandboxing it while making it think that it runs normally. This is irrelevant here.
I don’t believe that a system can work only in formal proofs
It is quite probable that a “pure math Oracle” system cannot work. The point was, it can be made safe to try.
What if its domain is restricted to math and self-modification? Then, if it fooms, it will be a safe math Oracle, possibly even provably safe. Then it would be a huge help in the road to FAI, both directly and as a case study.
It may very well be possible to build such an AI. However there are several issues with it:
The AI can be adapted for other, less restricted, domains if knowledge on how it works spreads. There would be a large incentive to since such an oracle would only be of limited utility.
The AI adds code that will evolve into another AI into it’s output. It’s remotely possible, depending on what kind of problems you have it working on. If you were using it to design more efficient algorithms, in some cases an AI of some form might be the optimal solution.
Even if you 100% trust the AI to provide the optimal output, you can’t trust that the optimal output to the problem you’ve specified is what you actually want.
The AI could self-modify incorrectly and result in unfriendly AI. In order to be provably friendly/restricted, it would have to be 100% certain of any modification. That’s a very tall order, especially in AI where everything has to be approximations or probabilistic.
It might not be as safe as you think it is. The AI runs some code and gets an unexpected result. Possibly because of a bug in the environment itself. Look up how difficult it is to sandbox untrusted code and you will get some appreciation for how a superintelligence could figure a way out of it’s box.
But it can’t do anything with any exploits it finds because it is restricted to hard-coded axioms? Well, maybe. If it’s using probabilities and some form of machine learning, it might be able to learn that “executing this code give me this result” and then learn to take advantage of that. I don’t believe that a system can work only in formal proofs. However I might be completely wrong about this one, it’s just a thought.
That the ideas from a safe AI can be used to build an unsafe AI is a general argument against working on (or even talking about) any kind of AI whatsoever.
The output is to contain only proofs of theorems. Specifically, a proof (or refutation) of the theorem in the input. The state of the system is to be reset after each run so as to not accumulate information.
Any correct or incorrect self-modification is still restricted to the math domain, and so cannot result in an unsafe AI.
Guarding against software bugs is easy in this case. You design an abstract virtual machine environment for the AI, then design the software that implements this environment, then formally prove that the software is correct. Guarding against errors caused by cosmic rays is also easy. You estimate the probability of such an error, and then add redundancy until the probability is so low that it won’t happen until the end of the universe.
Sandboxing untrusted code is easy. The difficult thing is sandboxing it while making it think that it runs normally. This is irrelevant here.
It is quite probable that a “pure math Oracle” system cannot work. The point was, it can be made safe to try.