We can and do make judgements about rationality and values. Therefore I don’t see why AIs need fail at it. I’m starting to get a vague idea how to proceed… Let me work on it for a few more days/weeks, then I’ll post it.
In these matters, introspection is fairly suspect. And simply unavailable when talking about humans other than oneself (which I think Stuart is doing, maybe I misread).
We’re talking about “mak[ing] judgements about rationality and values”. That’s entirely SOP for humans and introspection allows you to observe it in real time. This is not some kind of an unconscious/hidden/masked activity.
Moreover other humans certainly behave as if they make judgements about rationality (usually expressed as “this makes {no} sense”) and values of others. They even openly verbalise these judgements.
May I suggest a test for any such future model? It should take into account that I have unconsciousness sub-personalities which affect my behaviour but I don’t know about them.
We can and do make judgements about rationality and values. Therefore I don’t see why AIs need fail at it. I’m starting to get a vague idea how to proceed… Let me work on it for a few more days/weeks, then I’ll post it.
How do you know this is true? Perhaps we make judgements about predicted behaviors and retrofit stories about rationality and values onto that.
By introspection?
In these matters, introspection is fairly suspect. And simply unavailable when talking about humans other than oneself (which I think Stuart is doing, maybe I misread).
We’re talking about “mak[ing] judgements about rationality and values”. That’s entirely SOP for humans and introspection allows you to observe it in real time. This is not some kind of an unconscious/hidden/masked activity.
Moreover other humans certainly behave as if they make judgements about rationality (usually expressed as “this makes {no} sense”) and values of others. They even openly verbalise these judgements.
May I suggest a test for any such future model? It should take into account that I have unconsciousness sub-personalities which affect my behaviour but I don’t know about them.
That is a key feature.
Also, the question was not if I could judge other’s values, but is it possible to prove that AI has the same values as a human being.
Or are you going to prove the equality of two value systems while at least one of them of them remains unknowable?
I’m more looking at “formalising human value-like things, into something acceptable”.