I believe that failsafes are necessary and desirable, but not sufficient. Thinking you can solve the friendly AI problem just by defining failsafe rules is dangerously naive. You not only need to guarantee that the AI correctly reimplements the safeguards in all its successors, you also need to guarantee that the safeguards themselves don’t have bugs that cause disaster.
It is not necessarily safe or acceptable for the AI to shut itself down after it’s been running for awhile, and there is not necessarily a clear line between the AI itself and the AI’s technological creations. It is not necessarily safe or possible for the AI to wait for input from human operators. A failsafe does not necessarily trigger in quite the circumstances you expect, and a superintelligence may try to rules-lawyer its way around it.
I believe that failsafes are necessary and desirable, but not sufficient. Thinking you can solve the friendly AI problem just by defining failsafe rules is dangerously naive. You not only need to guarantee that the AI correctly reimplements the safeguards in all its successors, you also need to guarantee that the safeguards themselves don’t have bugs that cause disaster.
It is not necessarily safe or acceptable for the AI to shut itself down after it’s been running for awhile, and there is not necessarily a clear line between the AI itself and the AI’s technological creations. It is not necessarily safe or possible for the AI to wait for input from human operators. A failsafe does not necessarily trigger in quite the circumstances you expect, and a superintelligence may try to rules-lawyer its way around it.