moridinamael comments on AI Alignment Problem: “Human Values” don’t Actually Exist

moridinamael 23 Apr 2019 15:11 UTC
6 points
The idea of AI alignment is based on the idea that there is a finite, stable set of data about a person, which could be used to predict one’s choices, and which is actually morally good. The reasoning behind this basis is because if it is not true, then learning is impossible, useless, or will not converge.
Is it true that these assumptions are required for AI alignment?
I don’t think it would be impossible to build an AI that is sufficiently aligned to know that, at pretty much any given moment, I don’t want to be spontaneously injured, or be accused of doing something that will reliably cause all my peers to hate me, or for a loved one to die. There’s quite a broad list of “easy” specific “alignment questions”, that virtually 100% of humans will agree on in virtually 100% of circumstances. We could do worse than just building the partially-aligned AI who just makes sure we avoid fates worse than death, individually and collectively.
On the other hand, I agree completely that coupling the concepts of “AI alignment” and “optimization” seems pretty fraught. I’ve wondered if the “optimal” environment for the human animal might be a re-creation of the Pleistocene, except with, y’know, immortality, and carefully managed, exciting-but-not-harrowing levels of resource scarcity.
- avturchin 23 Apr 2019 16:40 UTC
  4 points
  Parent
  There is some troubles in creating full and safe list of such human preferences, and there were an idea that AI will be capable to learn actual human preferences by observing human behaviour or by other means, like inverse reinforcement learning.
  This my post basically shows that value learning will also have troubles, as there is no real human values, so some other ways to create such list of preferences is needed.
  How to align the AI with existing preference, presented in human language, is another question. Yudkowsky wrote that without taking into account the complexity of value, we can’t make safe AI, as it would wrongly interpret short commands without knowing the context.