I’m no programmer, so I have no comment on “how to develop” part. The “safe” part seems extremely unsafe to me though.
1) Your strategy relies on human supervisor’s ability to recognize a threat that is disguised by superintelligence. Which is doomed to failure almost by definition.
2) Supervisor himself is not protected from possible threat. He is also one of the main targets that AI would want to affect.
3) >Moreover, the artificial agent won’t be able to change the operational system of the computer, its own code or any offline task that could fundamentally change the system. I don’t see what kind of manual supervising could possibly accomplish that even if none of other problems existed.
4) Human experts don’t have “complete understanding” of any subject worth mentioning. Certainly nothing involving biology. So your AI will just produce a text that convinces them that proposed solution is safe. Being superintelligent, it’ll be able to do it even if the solution is not in fact safe. Or it might produce some other dangerous texts, like texts that convince them to lie to you that solution is safe.
I’m no programmer, so I have no comment on “how to develop” part. The “safe” part seems extremely unsafe to me though.
1) Your strategy relies on human supervisor’s ability to recognize a threat that is disguised by superintelligence. Which is doomed to failure almost by definition.
2) Supervisor himself is not protected from possible threat. He is also one of the main targets that AI would want to affect.
3) >Moreover, the artificial agent won’t be able to change the operational system of the computer, its own code or any offline task that could fundamentally change the system.
I don’t see what kind of manual supervising could possibly accomplish that even if none of other problems existed.
4) Human experts don’t have “complete understanding” of any subject worth mentioning. Certainly nothing involving biology. So your AI will just produce a text that convinces them that proposed solution is safe. Being superintelligent, it’ll be able to do it even if the solution is not in fact safe. Or it might produce some other dangerous texts, like texts that convince them to lie to you that solution is safe.