Stuart_Armstrong comments on Humans can be assigned any values whatsoever...

Stuart_Armstrong 13 Oct 2017 14:00 UTC
0 points
We can and do make judgements about rationality and values. Therefore I don’t see why AIs need fail at it. I’m starting to get a vague idea how to proceed… Let me work on it for a few more days/weeks, then I’ll post it.
- Dagon 13 Oct 2017 16:52 UTC
  4 points
  Parent
  
  We can and do make judgements about rationality and values.
  
  How do you know this is true? Perhaps we make judgements about predicted behaviors and retrofit stories about rationality and values onto that.
  - Lumifer 13 Oct 2017 16:54 UTC
    0 points
    Parent
    
    How do you know this is true?
    
    By introspection?
    - Dagon 13 Oct 2017 22:43 UTC
      0 points
      Parent
      In these matters, introspection is fairly suspect. And simply unavailable when talking about humans other than oneself (which I think Stuart is doing, maybe I misread).
      - Lumifer 13 Oct 2017 23:43 UTC
        2 points
        Parent
        We’re talking about “mak[ing] judgements about rationality and values”. That’s entirely SOP for humans and introspection allows you to observe it in real time. This is not some kind of an unconscious/hidden/masked activity.
        
        Moreover other humans certainly behave as if they make judgements about rationality (usually expressed as “this makes {no} sense”) and values of others. They even openly verbalise these judgements.
- turchin 13 Oct 2017 14:13 UTC
  2 points
  Parent
  May I suggest a test for any such future model? It should take into account that I have unconsciousness sub-personalities which affect my behaviour but I don’t know about them.
  - Stuart_Armstrong 14 Oct 2017 6:11 UTC
    2 points
    Parent
    That is a key feature.
- turchin 13 Oct 2017 14:19 UTC
  0 points
  Parent
  Also, the question was not if I could judge other’s values, but is it possible to prove that AI has the same values as a human being.
  
  Or are you going to prove the equality of two value systems while at least one of them of them remains unknowable?
  - Stuart_Armstrong 14 Oct 2017 6:12 UTC
    2 points
    Parent
    I’m more looking at “formalising human value-like things, into something acceptable”.