MichaelA comments on [Article review] Artificial Intelligence, Values, and Alignment

MichaelA 10 Mar 2020 8:28 UTC
1 point
This seems incorrect—if we don’t have “the one true theory” (assuming it exists), then how do we know it can’t be reliably communicated?
To be fair to the paper, I’m not sure that that specifically is as strong an argument as it might look. E.g., I don’t have a proof for [some as-yet-unproven mathematical conjecture], but I feel pretty confident that if I did come up with such a proof, I wouldn’t be able to reliably communicate it to just any given random person.
But note that there I’m saying “I feel pretty confident”, and “I wouldn’t be able to”. So I think the issue is more in the “can’t”, and the implication that we couldn’t fix that “can’t” even if we tried, rather than in the fact these arguments are being applied to something we haven’t discovered yet.
That said, I do think it’s an interesting and valid point that the fact we haven’t found that theory yet (again, assuming it exists) adds at least a small extra reason to believe it’s possible we could communicate it reliably. For example, my second-hand impression is that some philosophers think “the true moral theory” would be self-evidently true, once discovered, and would be intrinsically motivating, or something like that. That seems quite unlikely to me, and I wouldn’t want to rely on it at all, but I guess it is yet another reason why it’s possible the theory could be reliably communicated.
And I guess even if the theory was not quite “self-evidently true” or “intrinsically motivating”, it might still be shockingly simple, intuitive, and appealing, making it easier to reliably communicate than we’d otherwise expect.
Perhaps given that we don’t know that it can be reliably communicated, we shouldn’t rely on that.
Yes, I’d strongly agree with that. I sort-of want us to make as few assumptions on philosophical matters as possible, though I’m not really sure precisely what that means or what that looks like.
“Designing AI in accordance with a single moral doctrine would therefore involve imposing a set of values and judgments on other people who did not agree with them”
Unless the correct moral theory doesn’t involve doing that?
To again be fair to the paper, I believe the argument is that, given the assumption (which I contest) that we definitely couldn’t reliably convince everyone of the “correct moral theory”, if we wanted to align an AI with that theory we’d effectively end up imposing that theory on people who didn’t sign up for it.
You might have been suggesting that such an imposition might be explicitly prohibited by the correct moral theory, or something like that. But in that case, I think the problem is instead that we wouldn’t be able to align the AI with that theory, without at least some contradictions, if people couldn’t be convinced of the theory (which, again, I don’t see as certain).