I don’t understand. Is the claim here that you can build a “decide whether the risk of botched Friendly AI is worth taking machine”, and the risk of botching such a machine is much less than the risk of botching a Friendly AI?
A FAI that includes such “Should I run?” heuristic could pose a lesser risk than a FAI without such heuristic. If this heuristic works better than human judgment about running a FAI, it should be used instead of human judgment.
This is the same principle as for AI’s decisions themselves, where we don’t ask AI’s designers for object-level moral judgments, or encode specific object-level moral judgments into AI. Not running an AI would then be equivalent to hardcoding the decision “Should the AI run?” resolved by designers to “No.” into the AI, instead of coding the question and letting the AI itself answer it (assuming we can expect it to answer the question more reliably than the programmers can).
Yes, and if it tosses a coin, it has 50% chance of being right. The question is calibration, how much trust should such measures buy compared to their absence, given what is known about given design.
The machine Steve proposes might not bear as much risk of creating “living hell” by attempting to get the human utility function right, but missing in such a way that humans are still alive, just living very unpleasantly. To me, this seems by far the biggest of XiXiDu’s concerns.
I don’t get it. Design a Friendly AI that can better judge whether it’s worth the risk of botching the design of a Friendly AI?
ETA: I suppose your point applies to some of XiXiDu’s concerns but not others?
A lens that sees its flaws.
I don’t understand. Is the claim here that you can build a “decide whether the risk of botched Friendly AI is worth taking machine”, and the risk of botching such a machine is much less than the risk of botching a Friendly AI?
A FAI that includes such “Should I run?” heuristic could pose a lesser risk than a FAI without such heuristic. If this heuristic works better than human judgment about running a FAI, it should be used instead of human judgment.
This is the same principle as for AI’s decisions themselves, where we don’t ask AI’s designers for object-level moral judgments, or encode specific object-level moral judgments into AI. Not running an AI would then be equivalent to hardcoding the decision “Should the AI run?” resolved by designers to “No.” into the AI, instead of coding the question and letting the AI itself answer it (assuming we can expect it to answer the question more reliably than the programmers can).
If we botched the FAI, wouldn’t we also probably have botched its ability to decide whether it should run?
Yes, and if it tosses a coin, it has 50% chance of being right. The question is calibration, how much trust should such measures buy compared to their absence, given what is known about given design.
The machine Steve proposes might not bear as much risk of creating “living hell” by attempting to get the human utility function right, but missing in such a way that humans are still alive, just living very unpleasantly. To me, this seems by far the biggest of XiXiDu’s concerns.