Nope, I’m a software engineer, so I don’t have that particular magical model of how computer systems work.
But suppose you design even a conventional software system, to run something less dangerous than a general AI, like say a nuclear reactor. Would you have every valve and mechanism controlled only by the software, with no mechanical failsafes or manual overrides, trusting the software to have no deadly flaws?
Designing that software and proving that it would work correctly in every possible circumstance would be a rich and interesting research topic, but never be completed.
One difference with AI is that it is theoretically capable of analyzing your failsafes and overrides (and their associated hidden flaws) more thoroughly than you. Manual, physical overrides aren’t yet amenable to rigorous, formal analysis, but software is. If we employ a logic to prove constraints on the AI’s behavior, the AI shouldn’t be able to violate its constraints without basically exploiting an inconsistency in the logic, which seems far less likely than the case where, e.g., it finds a bug in the overrides or tricks the humans into sabotaging them.
Nope, I’m a software engineer, so I don’t have that particular magical model of how computer systems work.
But suppose you design even a conventional software system, to run something less dangerous than a general AI, like say a nuclear reactor. Would you have every valve and mechanism controlled only by the software, with no mechanical failsafes or manual overrides, trusting the software to have no deadly flaws?
Designing that software and proving that it would work correctly in every possible circumstance would be a rich and interesting research topic, but never be completed.
One difference with AI is that it is theoretically capable of analyzing your failsafes and overrides (and their associated hidden flaws) more thoroughly than you. Manual, physical overrides aren’t yet amenable to rigorous, formal analysis, but software is. If we employ a logic to prove constraints on the AI’s behavior, the AI shouldn’t be able to violate its constraints without basically exploiting an inconsistency in the logic, which seems far less likely than the case where, e.g., it finds a bug in the overrides or tricks the humans into sabotaging them.