I think you proved that values can’t exist outside a human mind, and it is a big problem to the idea of value alignment.
The only solution I see is: don’t try to extract values from the human mind, but try to upload a human mind into a computer. In that case, we kill two birds with one stone: we have some form of AI, which has human values (no matter what are they), and it has also common sense.
Upload as AI safety solution also may have difficulties in foom-style self-improving, as its internal structure is messy and incomprehensible for normal human mind. So it is intrinsically safe and only known workable solution to the AI safety.
However, there are (at least) two main problems with such solution of AI safety: it may give rise to neuromorphic non-human AIs and it is not preventing the later appearance of pure AI, which will foom and kill everybody.
The solution to it I see in using first human upload as AI Nanny or AI police which will prevent the appearance of any other more sophisticated AIs elsewhere.
We can and do make judgements about rationality and values. Therefore I don’t see why AIs need fail at it. I’m starting to get a vague idea how to proceed… Let me work on it for a few more days/weeks, then I’ll post it.
In these matters, introspection is fairly suspect. And simply unavailable when talking about humans other than oneself (which I think Stuart is doing, maybe I misread).
We’re talking about “mak[ing] judgements about rationality and values”. That’s entirely SOP for humans and introspection allows you to observe it in real time. This is not some kind of an unconscious/hidden/masked activity.
Moreover other humans certainly behave as if they make judgements about rationality (usually expressed as “this makes {no} sense”) and values of others. They even openly verbalise these judgements.
May I suggest a test for any such future model? It should take into account that I have unconsciousness sub-personalities which affect my behaviour but I don’t know about them.
I think you proved that values can’t exist outside a human mind, and it is a big problem to the idea of value alignment.
The only solution I see is: don’t try to extract values from the human mind, but try to upload a human mind into a computer. In that case, we kill two birds with one stone: we have some form of AI, which has human values (no matter what are they), and it has also common sense.
Upload as AI safety solution also may have difficulties in foom-style self-improving, as its internal structure is messy and incomprehensible for normal human mind. So it is intrinsically safe and only known workable solution to the AI safety.
However, there are (at least) two main problems with such solution of AI safety: it may give rise to neuromorphic non-human AIs and it is not preventing the later appearance of pure AI, which will foom and kill everybody.
The solution to it I see in using first human upload as AI Nanny or AI police which will prevent the appearance of any other more sophisticated AIs elsewhere.
We can and do make judgements about rationality and values. Therefore I don’t see why AIs need fail at it. I’m starting to get a vague idea how to proceed… Let me work on it for a few more days/weeks, then I’ll post it.
How do you know this is true? Perhaps we make judgements about predicted behaviors and retrofit stories about rationality and values onto that.
By introspection?
In these matters, introspection is fairly suspect. And simply unavailable when talking about humans other than oneself (which I think Stuart is doing, maybe I misread).
We’re talking about “mak[ing] judgements about rationality and values”. That’s entirely SOP for humans and introspection allows you to observe it in real time. This is not some kind of an unconscious/hidden/masked activity.
Moreover other humans certainly behave as if they make judgements about rationality (usually expressed as “this makes {no} sense”) and values of others. They even openly verbalise these judgements.
May I suggest a test for any such future model? It should take into account that I have unconsciousness sub-personalities which affect my behaviour but I don’t know about them.
That is a key feature.
Also, the question was not if I could judge other’s values, but is it possible to prove that AI has the same values as a human being.
Or are you going to prove the equality of two value systems while at least one of them of them remains unknowable?
I’m more looking at “formalising human value-like things, into something acceptable”.