Pattern comments on [Article review] Artificial Intelligence, Values, and Alignment

Pattern 9 Mar 2020 17:01 UTC
3 points
Short:
I agree with MichaelA’s questions about the paper.

Long:
Responses to quotes from the paper:
Furthermore, even if this were not the case and we came to have great confidence in the truth of a single moral theory, the proposed approach immediately encounters a second problem, namely that there would still be no way of reliably communicating this truth to others.
This seems incorrect—if we don’t have “the one true theory” (assuming it exists), then how do we know it can’t be reliably communicated? Though this may be close to hitting the nail on the head:
- how do we reliably communicate “the one true theory” to “AI”?
- Perhaps given that we don’t know that it can be reliably communicated, we shouldn’t rely on that.
Designing AI in accordance with a single moral doctrine would therefore involve imposing a set of values and judgments on other people who did not agree with them
Unless the correct moral theory doesn’t involve doing that? It seems like this is just changing the name of the search.
In the absence of moral agreement, is there a fair way to decide what principles AI should align with?
It’s not clear what “fair” means here—but the paper might be about looking for “morality”/achieving one of its properties under a different name, as noted above.
- MichaelA 10 Mar 2020 8:28 UTC
  1 point
  Parent
  This seems incorrect—if we don’t have “the one true theory” (assuming it exists), then how do we know it can’t be reliably communicated?
  To be fair to the paper, I’m not sure that that specifically is as strong an argument as it might look. E.g., I don’t have a proof for [some as-yet-unproven mathematical conjecture], but I feel pretty confident that if I did come up with such a proof, I wouldn’t be able to reliably communicate it to just any given random person.
  But note that there I’m saying “I feel pretty confident”, and “I wouldn’t be able to”. So I think the issue is more in the “can’t”, and the implication that we couldn’t fix that “can’t” even if we tried, rather than in the fact these arguments are being applied to something we haven’t discovered yet.
  That said, I do think it’s an interesting and valid point that the fact we haven’t found that theory yet (again, assuming it exists) adds at least a small extra reason to believe it’s possible we could communicate it reliably. For example, my second-hand impression is that some philosophers think “the true moral theory” would be self-evidently true, once discovered, and would be intrinsically motivating, or something like that. That seems quite unlikely to me, and I wouldn’t want to rely on it at all, but I guess it is yet another reason why it’s possible the theory could be reliably communicated.
  And I guess even if the theory was not quite “self-evidently true” or “intrinsically motivating”, it might still be shockingly simple, intuitive, and appealing, making it easier to reliably communicate than we’d otherwise expect.
  Perhaps given that we don’t know that it can be reliably communicated, we shouldn’t rely on that.
  Yes, I’d strongly agree with that. I sort-of want us to make as few assumptions on philosophical matters as possible, though I’m not really sure precisely what that means or what that looks like.
  “Designing AI in accordance with a single moral doctrine would therefore involve imposing a set of values and judgments on other people who did not agree with them”
  Unless the correct moral theory doesn’t involve doing that?
  To again be fair to the paper, I believe the argument is that, given the assumption (which I contest) that we definitely couldn’t reliably convince everyone of the “correct moral theory”, if we wanted to align an AI with that theory we’d effectively end up imposing that theory on people who didn’t sign up for it.
  You might have been suggesting that such an imposition might be explicitly prohibited by the correct moral theory, or something like that. But in that case, I think the problem is instead that we wouldn’t be able to align the AI with that theory, without at least some contradictions, if people couldn’t be convinced of the theory (which, again, I don’t see as certain).