Can you describe what you think of when you say “humanity’s preferences”? The preferences of humans or human groups can and do conflict with each other, hence it is not just a question of complexity, right?
I’m sure there are multiple approaches for formalizing what we mean when we say “humanity’s preferences”, but intuitive understanding is enough for the sake of this discussion. In my (speculative) opinion, the problem precisely is with the complexity of the Utility Function capturing this intuitive understanding.
For simplicity let’s say there’s a single human being with transitive preferences and we want to perfectly align an AI with the preferences of this human. The cyclomatic complexity of such a perfect Utility Function can be easily higher than that of the human brain (it needs to perfectly predict the utility of every single thing “X” including all the imaginable consequences of having “X”, with all these Xes that might be unknown for humanity for now).
Can you describe what you think of when you say “humanity’s preferences”? The preferences of humans or human groups can and do conflict with each other, hence it is not just a question of complexity, right?
I’m sure there are multiple approaches for formalizing what we mean when we say “humanity’s preferences”, but intuitive understanding is enough for the sake of this discussion. In my (speculative) opinion, the problem precisely is with the complexity of the Utility Function capturing this intuitive understanding.
For simplicity let’s say there’s a single human being with transitive preferences and we want to perfectly align an AI with the preferences of this human. The cyclomatic complexity of such a perfect Utility Function can be easily higher than that of the human brain (it needs to perfectly predict the utility of every single thing “X” including all the imaginable consequences of having “X”, with all these Xes that might be unknown for humanity for now).