multifoliaterose comments on Open Thread June 2010, Part 3

multifoliaterose 15 Jun 2010 8:36 UTC
2 points

There is no way for human values to magically jump inside the AI, so if it’s not specifically created to reflect them, it won’t have them, and whatever the AI ends up with won’t come close to human values, because human values are too complex to be resembled by any given structure that happens to be formed in the AI.

I’m not convinced by the claim that human values have high Kolmogorov complexity.

In particular, Eliezer’s article Not for the Sake of Happiness Alone is totally at odds with my own beliefs. In my mind, it’s incoherent to give anything other than subjective experiences ethical consideration. My own preference for real science over imagined science is entirely instrumental and not at all terminal.

Now, maybe Eliezer is confused about what his terminal values are, or maybe I’m confused about what my terminal values are, or maybe our terminal values are incompatible. In any case, it’s not obvious that an AI should care about anything other than the subjective experiences of sentient beings.

Suppose that it’s okay for an AI to exclude everything but subjective experience from ethical consideration. Is there then still reason to expect that human values have high Kolmogorov complexity?

I don’t have a low complexity description to offer, but it seems to me that one can get a lot of mileage out of the principles “if an individual prefers state A to state B whenever he/she/it is in either of state A or state B, then state A is superior for that individual to state B” and “when faced with two alternatives, the moral alternative is the one that you would prefer if you were going to live through the lives of all sentient beings involved.”

Of course “sentient being” is ill-defined and one would have to do a fair amount of work frame the things that I just said in more formal terms, but anyway, it’s not clear to me that there’s a really serious problem here.

The more AI’s preference diverges from ours, the more we lose, and this loss is on astronomic scale (even if preference diverges relatively little).

I totally agree that if the creation of a superhuman AI is going to precede all other existential threats then we should focus all of our resources on trying to get the superhuman AI to be as friendly as possible.
What links here?
- multifoliaterose's comment on Should I believe what the SIAI claims? by XiXiDu (14 Aug 2010 20:11 UTC; 0 points)
- khafra 15 Jun 2010 10:28 UTC
  3 points
  Parent
  Have you read the Heaven post by denisbider and the two follow-ups constituting a mini-wireheading series? There have been other posts on the difference between wanting and liking; but it illustrates a fairly strong problem with wireheading: Even if all we’re worried about is “subjective states,” many people won’t want to be put in that subjective state, even knowing they’ll like it. Forcing them into it or changing their value system so they do want it are ethically suboptimal solutions.
  
  So, it seems to me that if anything other than maximized absolute wireheading for everyone is the AI’s goal, it’s gonna start to get complicated.
  - multifoliaterose 15 Jun 2010 23:26 UTC
    0 points
    Parent
    Thanks for the references to the posts which I had not seen before and which I find relevant. I’m sympathetic toward denisbider’s view, but will read the comments to see if I find diverging views compelling.
- Vladimir_Nesov 15 Jun 2010 11:16 UTC
  2 points
  Parent
  Maybe you should start with what’s linked from fake fake utility functions then (the page on the wiki wasn’t organized quite as I expected).
- multifoliaterose 15 Jun 2010 8:41 UTC
  1 point
  Parent
  But I would qualify the last sentence of my reply by saying that the best way to get a superhuman AI to be as friendly as possible may not be to work on friendly AI or advocate for friendly AI. For example, it may be best to work toward geopolitical stability to minimize the chances of some country rashly creating a potentially unsafe AI out of a sense of desperation during wartime.
- Vladimir_Nesov 15 Jun 2010 11:08 UTC
  0 points
  Parent
  
  I totally agree that if the creation of a superhuman AI is going to precede all other existential threats then we should focus all of our resources on trying to get the superhuman AI to be as friendly as possible.
  
  (?) I never said that.
  - multifoliaterose 15 Jun 2010 19:32 UTC
    0 points
    Parent
    Yes, I was agreeing with what I inferred your attitude to be rather than agreeing with something that you said. (I apologize if I distorted your views—if you’d like I can edit my comment to remove the suggestion that you hold the position that I attributed to you.)
    - Vladimir_Nesov 15 Jun 2010 20:05 UTC
      3 points
      Parent
      I don’t believe that we “should focus all of our resources” on FAI, as there are many other worthy activities to focus on. The argument is that this particular problem gets disproportionally little attention, and while with other risks we can in principle luck out even if they get no attention, it isn’t so for AI. Failing to take FAI seriously is fatal, failing to take nanotech seriously isn’t necessarily fatal.
      
      Thus, although strictly speaking I agree with your implication, I don’t see its condition plausible, and so implication as whole relevant.
      What links here?
      Other Existential Risks by multifoliaterose (17 Aug 2010 21:24 UTC; 40 points)
- timtyler 16 Jun 2010 15:35 UTC
  −1 points
  Parent
  Re: “Is there then still reason to expect that human values have high Kolmogorov complexity?”
  
  Human values are mosly a product of their genes and their memes. There is an awful lot of information in those. However, it is true that you can fairly closely approximate human values—or those of any other creature—by the directive to make as many grandchildren as possible—which seems reasonably simple.
  
  Most of the arguments for humans having complex values appear to list a whole bunch of proximate goals—as though that constitutes evidence.
  - SilasBarta 16 Jun 2010 16:34 UTC
    4 points
    Parent
    I disagree. You need to know much more than just the drive for grandchildren, given the massively diverse ways we observe even in our present world for species to propagate, all of which correspond to different articulable values once they reach human intelligence.
    
    Human values should be expected to have a high K-complexity because you would need to specify both the genes/early environment, and the precise place in history/Everett branches where humans are now.
    - timtyler 16 Jun 2010 16:43 UTC
      0 points
      Parent
      The idea was to “approximate human values”—not to express them in precise detail: nobody cares much if Jim likes strawberry jam more than he likes raspberry jam.
      
      The environment mostly drops out of the equation—because most of it is shared between the agents involved—and because of the phenomenon of Canalisation: http://en.wikipedia.org/wiki/Canalisation_%28genetics%29
      - SilasBarta 16 Jun 2010 16:49 UTC
        1 point
        Parent
        
        The idea was to “approximate human values”—not to express them in precise detail
        
        Sure, but I take “approximation” to mean something like getting you within 10 or so bits of the true distribution, but the heuristic you gave still leaves you maybe 500 or so bits away, which is huge, and far more than you implied.
        
        The environment mostly drops out of the equation—because most of it is shared between the agents involved—and because of the phenomenon of Canalisation
        
        That would help you on message length if you had already stored one person’s values and were looking to store a second person’s. It does not for describing the first person’s value, or some aggregate measure of humans’ values.
        timtyler 16 Jun 2010 16:55 UTC
        1 point
        Parent
        10 bits!!! That’s not much of a message!
        
        The idea of a shared environment arises because the proposed machine—in which the human-like values are to be implemented—is to live in the same world as the human. So, one does not need to specify all the details of the environment—since these are shared naturally between the agents in question.
        SilasBarta 16 Jun 2010 17:15 UTC
        0 points
        Parent
        
        10 bits!!! That’s not much of a message!
        
        10 bits short of the needed message, not a 10-bit message. I mean that e.g. an approximation gives 100 bits when full accuracy would be 110 bits (and 10 bits is an upper bound).
        
        The idea of a shared environment arises because the proposed machine—in which the human-like values are to be implemented—is to live in the same world as the human. So, one does not need to specify all the details of the environment—since these are shared naturally between the agents in question.
        
        That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.
        timtyler 16 Jun 2010 18:53 UTC
        1 point
        Parent
        Re: “That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.”
        
        To specify the environment, choose the universe, galaxy, star, planet, lattiude, longitude and time. I am not pretending that information is simple, just that it is already there, if your project is building an intelligent agent.
        timtyler 16 Jun 2010 18:50 UTC
        1 point
        Parent
        Re: “10 bits short of the needed message”.
        
        Yes, I got that the first time. I don’t think you are appreciating the difficulty of coding even relatively simple utility functions. A couple of ASCII characters is practically nothing!
        SilasBarta 17 Jun 2010 17:40 UTC
        0 points
        Parent
        ASCII characters aren’t a relevant metric here. Getting within 10 bits of the correct answer means that you’ve narrowed it down to 2^10 = 1024 distinct equiprobable possibilities [1], one of which is correct. Sounds like an approximation to me! (if a bit on the lower end of the accuracy expected out of one)
        
        [1] or probability distribution with the same KL divergence from the true governing distribution
    - red75 16 Jun 2010 19:03 UTC
      −4 points
      Parent
      Or you can implement constant K-complexity learn-by-example algorithm and get all the rest from environment.
      
      How about “Do as your creators do (generalize this as your creators generalize)”?