At first glance, I was also skeptical of tailcalled’s idea, but now I find I’m starting to warm up to it. Since you didn’t ask for a practical proposal, just a concrete one, I give you this:
Limit the computational power the box has, so that if the AI begins engaging in recursive self-improvement, it’ll run more and more slowly from our perspective, so we’ll have ample time to shut it off. (Of course, from the AI’s perspective, time will run as quickly as it always does, since the whole world will slow down with it.)
(optional) Create multiple human-level intelligences in the world (ignoring ethical constraints here), and see how the AI interacts with them. Run the simulation until you are reasonably certain (for a very stringent definition of “reasonably”) from the AI’s behavior that it is Friendly.
The problem with this is that even if you can determine with certainty that an AI is friendly, there is no certainty that it will stay that way. There could be a series of errors as it goes about daily life, each acting as a mutation, serving to evolve the “Friendly” AI into a less friendly one
I would probably only include it as part of a batch of tests and proofs. It would be pretty foolish to rely on only one method to check if something that will destroy the world if it fails works correctly.
At first glance, I was also skeptical of tailcalled’s idea, but now I find I’m starting to warm up to it. Since you didn’t ask for a practical proposal, just a concrete one, I give you this:
Implement an AI in Conway’s Game of Life.
Don’t interact with it in any way.
Limit the computational power the box has, so that if the AI begins engaging in recursive self-improvement, it’ll run more and more slowly from our perspective, so we’ll have ample time to shut it off. (Of course, from the AI’s perspective, time will run as quickly as it always does, since the whole world will slow down with it.)
(optional) Create multiple human-level intelligences in the world (ignoring ethical constraints here), and see how the AI interacts with them. Run the simulation until you are reasonably certain (for a very stringent definition of “reasonably”) from the AI’s behavior that it is Friendly.
Profit.
The problem with this is that even if you can determine with certainty that an AI is friendly, there is no certainty that it will stay that way. There could be a series of errors as it goes about daily life, each acting as a mutation, serving to evolve the “Friendly” AI into a less friendly one
Hm. That does sound more workable than I had thought.
I would probably only include it as part of a batch of tests and proofs. It would be pretty foolish to rely on only one method to check if something that will destroy the world if it fails works correctly.
Yes, I agree with you on that. (Step 5 was intended as a joke/reference.)