Fascinating discussion and blog. Surely one obvious safeguard to a super-smart AI agent going morally astray or running amok would be to inseparably associate with it a dumber “confessor” AI agent which, while lacking its prowess and flexibility, would have at least the run-of-the-mill intelligence to detect when a proposal might conflict with acceptable human moral standards.
I called it a confessor, by analogy with priests privy to the sins and wicked thoughts of even powerful people. But loads of analogies come to mind, for example an eight stone jockey controlling a half ton race horse, faster than the rider, or a high-resistance loop off a million volt power line to which a small instrument can be rigged to indicate the current flowing in the main line.
You could even have a cascade of agents, each somewhat dumber and less flexible than the next, but all required to justify their decisions down the line prior to action, and the first which failed to agree on a plan (by either not understanding it or concluding it was immoral) would flag up a warning to human observers.
Fascinating discussion and blog. Surely one obvious safeguard to a super-smart AI agent going morally astray or running amok would be to inseparably associate with it a dumber “confessor” AI agent which, while lacking its prowess and flexibility, would have at least the run-of-the-mill intelligence to detect when a proposal might conflict with acceptable human moral standards.
I called it a confessor, by analogy with priests privy to the sins and wicked thoughts of even powerful people. But loads of analogies come to mind, for example an eight stone jockey controlling a half ton race horse, faster than the rider, or a high-resistance loop off a million volt power line to which a small instrument can be rigged to indicate the current flowing in the main line.
You could even have a cascade of agents, each somewhat dumber and less flexible than the next, but all required to justify their decisions down the line prior to action, and the first which failed to agree on a plan (by either not understanding it or concluding it was immoral) would flag up a warning to human observers.
One thing I kind of like about this idea is the ‘confessor’ could be faster than the ‘horse’ simply by being dumber (and taking less code to run).