You mean, the question is how exactly the Utility Function is calculated for humanity’s preferences? That’s part of the problem. We cannot easily fit an entirety of our preferences into a simple Utility Function (which doesn’t mean there’s no such Utility Function which perfectly captures it, but simply means that formalization of this function is not achievable at present moment). As Robert Miles once said, if we encode 10 things we value the most into SuperAI’s Utility Function, the 11th thing is as good as gone.
Volodymyr Frolov
The misalignment problem is universal, extending far beyond AI research. We have to deal daily with misaligned artificial non-intelligent and non-artificial intelligent systems. Politicians, laws, credit score systems, and many other things around us are misaligned to some degree with the goals of the agents who created them or gave them the power. The AI Safety concern states that if an AGI system is disproportionately powerful, a tiny misalignment is enough to create unthinkable risks for humanity. It doesn’t matter if the powerful but tiny-misaligned system is AI or not, but we believe that AGI systems are going to be extraordinarily powerful and are doomed to be tiny-misaligned.
You can do a different thought experiment. I’m sure that you, like any other standard agent, are slightly misaligned with the goals of the rest of humanity. Imagine you have a near-infinite power. How bad would that be for the rest of humanity? If you were somebody else in this world, a random person, would you still want to find yourself in the situation where now-you has near-infinite power? I certainly don’t.
I feel like the assessment is just slightly rephrased questions I just answered. Didn’t find anything useful or insightful for myself.
That’s right. We need superintelligence to solve these problems that we don’t even understand. For such problems we might not even be able understand the very definition of it, not even talking about finding good solution.
I tackle this question in details in my NoUML essay.
You can think of an abstraction as an indivisible atom, a dot on some bigger diagram that on its own has no inner structure. So abstractions have no internal structure but they are rather useful in terms of their relations to other abstractions. In your examples there are 4 outgoing composition arrows going from “brushTeeth” to 4 other abstractions: “prepareToothbrush”, “actuallyBrushTeeth”, “rinseMouth” and “cleanToothbrush” and 3 incoming generalization arrows going from “people”, “dogs” and “cats” abstractions to “animal”.
Now imagine you have this huge diagram with all the abstractions in your programming system and their relations to each other. Imagine also that “brushTeeth” and “animal” abstractions on this diagram are both unlabeled for some reason. You would be able to infer their names easily just by looking at how these anonymous abstractions relate to the rest of the universe.
And no, consciousness is not a fundamental property of the universe.
Do you have any objective evidence supporting this claim?
If that’s just your assumption it would be helpful to clarify it in your essay, as the rest of your arguments follow from it.
That’s a valid observation, but my comment is about both of them (should we call them speculations at this point?)
For the sake of the argument, it doesn’t make any difference that we can’t write a computer program (BTW, can’t we? are you absolutely sure about theoretical impossibility of literature-detecting neural network?) to detect both. For literature we know that it is an emergent phenomenon and we have a solid understanding of what it means for some text to be considered a piece of literature, even though the exact boundaries of literature might be vague.
For literature not only we can point our finger to it but also we can give more precise definition. For consciousness we have no clue what it is. We suspect that consciousness in particular and personal identity in general might be just one type of qualia among many others; but then we have no clue what is qualia.
Consciousness could be an emergent phenomenon (like literature) or fundamental property of our universe. We don’t know and there’s no good reason to prefer one speculation over the other.
Consciousness, like literature, is a high level view that’s hard to pin down precisely, and is largely a matter of how we choose to define it.
It is worth noting that Consciousness is a phenomenon which needs an explanation we don’t have just yet. But it is still going to be the same phenomenon no matter how we choose to define it; the same way as it doesn’t matter how we chose to define wind or lightning for instance, it is still the same feature of nature. The only reasonable definition we can give it right now is to point our finger to it and say: “this is Consciousness”.
I’m sure there are multiple approaches for formalizing what we mean when we say “humanity’s preferences”, but intuitive understanding is enough for the sake of this discussion. In my (speculative) opinion, the problem precisely is with the complexity of the Utility Function capturing this intuitive understanding.
For simplicity let’s say there’s a single human being with transitive preferences and we want to perfectly align an AI with the preferences of this human. The cyclomatic complexity of such a perfect Utility Function can be easily higher than that of the human brain (it needs to perfectly predict the utility of every single thing “X” including all the imaginable consequences of having “X”, with all these Xes that might be unknown for humanity for now).