There is some troubles in creating full and safe list of such human preferences, and there were an idea that AI will be capable to learn actual human preferences by observing human behaviour or by other means, like inverse reinforcement learning.
This my post basically shows that value learning will also have troubles, as there is no real human values, so some other ways to create such list of preferences is needed.
How to align the AI with existing preference, presented in human language, is another question. Yudkowsky wrote that without taking into account the complexity of value, we can’t make safe AI, as it would wrongly interpret short commands without knowing the context.
There is some troubles in creating full and safe list of such human preferences, and there were an idea that AI will be capable to learn actual human preferences by observing human behaviour or by other means, like inverse reinforcement learning.
This my post basically shows that value learning will also have troubles, as there is no real human values, so some other ways to create such list of preferences is needed.
How to align the AI with existing preference, presented in human language, is another question. Yudkowsky wrote that without taking into account the complexity of value, we can’t make safe AI, as it would wrongly interpret short commands without knowing the context.