Your second proposal, trying to restrict what the AI can do after it’s made a decision, is a lost cause. Our ability to specify what is and is not allowed is simply too limited to resist any determined effort to find loopholes. This problem afflicts every field from contract law to computer security, so it seems unlikely that we’re going to find a solution anytime soon.
Your first proposal, making an AI that isn’t a complete AGI, is more interesting. Whether or not it’s feasible depends partly on your model of how an AI will work in the first place, and partly on how extreme the AI’s performance is expected to be.
For instance, I could easily envision a specialized software engineering AI that does nothing but turn English-language program descriptions into working software. Such a system could easily devote vast computing resources to heuristic searches of design space, and you could use it to design improved versions of itself as easily as anything else. It should be obvious that there’s little risk of unexpected behavior with such a system, because it doesn’t contain any parts that would motivate it to do anything but blindly run design searches on demand.
However, this assumes that such an AI can actually produce useful results without knowing about human psychology and senses, the business domains its apps are supposed to address, the world they’re going to interact with, etc. Many people argue that good design requires a great deal of knowledge in these seemingly unrelated fields, and some go so far as too say you need full-blown humanlike intelligence. The more of these secondary functions you add to the AI the more complex it becomes, and the greater the risk that some unexpected interaction will cause it to start doing things you didn’t intend for it to do.
So ultimately the specialization angle seems worthy of investigation, but may or may not work depending on which theory of AI turns out to be correct. Also, even a working version is only a temporary stopgap. The more computing power the AI has the more damage it can do in a short time if it goes haywire, and the easier it becomes for it to inadvertently create an unFriendly AGI as a side effect of some other activity.
Your second proposal, trying to restrict what the AI can do after it’s made a decision, is a lost cause. Our ability to specify what is and is not allowed is simply too limited to resist any determined effort to find loopholes. This problem afflicts every field from contract law to computer security, so it seems unlikely that we’re going to find a solution anytime soon.
Your first proposal, making an AI that isn’t a complete AGI, is more interesting. Whether or not it’s feasible depends partly on your model of how an AI will work in the first place, and partly on how extreme the AI’s performance is expected to be.
For instance, I could easily envision a specialized software engineering AI that does nothing but turn English-language program descriptions into working software. Such a system could easily devote vast computing resources to heuristic searches of design space, and you could use it to design improved versions of itself as easily as anything else. It should be obvious that there’s little risk of unexpected behavior with such a system, because it doesn’t contain any parts that would motivate it to do anything but blindly run design searches on demand.
However, this assumes that such an AI can actually produce useful results without knowing about human psychology and senses, the business domains its apps are supposed to address, the world they’re going to interact with, etc. Many people argue that good design requires a great deal of knowledge in these seemingly unrelated fields, and some go so far as too say you need full-blown humanlike intelligence. The more of these secondary functions you add to the AI the more complex it becomes, and the greater the risk that some unexpected interaction will cause it to start doing things you didn’t intend for it to do.
So ultimately the specialization angle seems worthy of investigation, but may or may not work depending on which theory of AI turns out to be correct. Also, even a working version is only a temporary stopgap. The more computing power the AI has the more damage it can do in a short time if it goes haywire, and the easier it becomes for it to inadvertently create an unFriendly AGI as a side effect of some other activity.