turchin comments on Humans can be assigned any values whatsoever...

turchin 13 Oct 2017 13:32 UTC
0 points
I think you proved that values can’t exist outside a human mind, and it is a big problem to the idea of value alignment.

The only solution I see is: don’t try to extract values from the human mind, but try to upload a human mind into a computer. In that case, we kill two birds with one stone: we have some form of AI, which has human values (no matter what are they), and it has also common sense.

Upload as AI safety solution also may have difficulties in foom-style self-improving, as its internal structure is messy and incomprehensible for normal human mind. So it is intrinsically safe and only known workable solution to the AI safety.

However, there are (at least) two main problems with such solution of AI safety: it may give rise to neuromorphic non-human AIs and it is not preventing the later appearance of pure AI, which will foom and kill everybody.

The solution to it I see in using first human upload as AI Nanny or AI police which will prevent the appearance of any other more sophisticated AIs elsewhere.
- Stuart_Armstrong 13 Oct 2017 14:00 UTC
  0 points
  Parent
  We can and do make judgements about rationality and values. Therefore I don’t see why AIs need fail at it. I’m starting to get a vague idea how to proceed… Let me work on it for a few more days/weeks, then I’ll post it.
  - Dagon 13 Oct 2017 16:52 UTC
    4 points
    Parent
    
    We can and do make judgements about rationality and values.
    
    How do you know this is true? Perhaps we make judgements about predicted behaviors and retrofit stories about rationality and values onto that.
    - Lumifer 13 Oct 2017 16:54 UTC
      0 points
      Parent
      
      How do you know this is true?
      
      By introspection?
      - Dagon 13 Oct 2017 22:43 UTC
        0 points
        Parent
        In these matters, introspection is fairly suspect. And simply unavailable when talking about humans other than oneself (which I think Stuart is doing, maybe I misread).
        Lumifer 13 Oct 2017 23:43 UTC
        2 points
        Parent
        We’re talking about “mak[ing] judgements about rationality and values”. That’s entirely SOP for humans and introspection allows you to observe it in real time. This is not some kind of an unconscious/hidden/masked activity.
        
        Moreover other humans certainly behave as if they make judgements about rationality (usually expressed as “this makes {no} sense”) and values of others. They even openly verbalise these judgements.
  - turchin 13 Oct 2017 14:13 UTC
    2 points
    Parent
    May I suggest a test for any such future model? It should take into account that I have unconsciousness sub-personalities which affect my behaviour but I don’t know about them.
    - Stuart_Armstrong 14 Oct 2017 6:11 UTC
      2 points
      Parent
      That is a key feature.
  - turchin 13 Oct 2017 14:19 UTC
    0 points
    Parent
    Also, the question was not if I could judge other’s values, but is it possible to prove that AI has the same values as a human being.
    
    Or are you going to prove the equality of two value systems while at least one of them of them remains unknowable?
    - Stuart_Armstrong 14 Oct 2017 6:12 UTC
      2 points
      Parent
      I’m more looking at “formalising human value-like things, into something acceptable”.