The AI wishes to make ten thousand tiny changes to the world, individually innocuous, but some combination of which add up to catastrophe. To submit its plan to a human, it would need to distill the list of predicted consequences down to its human-comprehensible essentials. The AI that understands which details are morally salient is one that doesn’t need the oversight.
The AI that understands which details are morally salient is one that doesn’t need the oversight.
That’s quite non-obvious to me. A quite arbitrary claim, it seems to me.
You’re basically saying if an intelligent mind (A for Alice) knows that person (B for Bob) will care about a certain Consequence C, then A will definitely know how much B will care about it.
This isn’t the case for real human minds. If Alice is a human mechanic and tells to Bob “I can fix your car, but it’ll cost 200$ dollars”, then Alice knows that Bob will care about the cost, but doesn’t know how much Bob will care, and whether Bob prefers to have a fixed car, or to have 200$.
So if your claim doesn’t even hold for human minds, why do you think it applies for non-human minds?
And even if it does hold, what about the case where Alice doesn’t know about whether a detail is morally salient, but errs on the side of caution. e.g. Alice the waitress asks Bob the customer “The chocolate icecream you asked for also has some crushed peanuts in it. Is that okay?”—and Bob can respond “Ofcourse, why should I care about that?” or alternatively “It’s not okay, I’m allergic to peanuts!”
In this case Alice the waitress doesn’t know if the detail is salient to Bob, but asks just to make sure.
The AI wishes to make ten thousand tiny changes to the world, individually innocuous, but some combination of which add up to catastrophe. To submit its plan to a human, it would need to distill the list of predicted consequences down to its human-comprehensible essentials. The AI that understands which details are morally salient is one that doesn’t need the oversight.
That’s quite non-obvious to me. A quite arbitrary claim, it seems to me.
You’re basically saying if an intelligent mind (A for Alice) knows that person (B for Bob) will care about a certain Consequence C, then A will definitely know how much B will care about it.
This isn’t the case for real human minds. If Alice is a human mechanic and tells to Bob “I can fix your car, but it’ll cost 200$ dollars”, then Alice knows that Bob will care about the cost, but doesn’t know how much Bob will care, and whether Bob prefers to have a fixed car, or to have 200$.
So if your claim doesn’t even hold for human minds, why do you think it applies for non-human minds?
And even if it does hold, what about the case where Alice doesn’t know about whether a detail is morally salient, but errs on the side of caution. e.g. Alice the waitress asks Bob the customer “The chocolate icecream you asked for also has some crushed peanuts in it. Is that okay?”—and Bob can respond “Ofcourse, why should I care about that?” or alternatively “It’s not okay, I’m allergic to peanuts!”
In this case Alice the waitress doesn’t know if the detail is salient to Bob, but asks just to make sure.
This is good, and I have no valid response at this time. Will try to think more about it later.