Vladimir_Nesov comments on The problem of pseudofriendliness

Vladimir_Nesov 16 Mar 2010 23:04 UTC
0 points
Not all autonomous agents are reflectively consistent. The autonomous agents that are not reflectively consistent want to become such (or to construct a singleton with their preference that is reflectively consistent). Preference is associated even with agents that are not autonomous (e.g. mice).

This is discussed in the post Friendly AI: a vector for human preference:

Intelligent agents have two thresholds in ability important in the long run: autonomy and reflective consistency. Autonomy is a point where an intelligent agent has a prospect of open-ended development, with a chance to significantly influence the whole world (by building/becoming a reflectively consistent agent). Humanity is autonomous in this sense, as probably are small groups of smart humans if given a much longer lifespan (although cultish attractors may stall progress indefinitely). Reflective consistency is the ability to preserve one’s preference, bringing the specific preference to the future without creating different-preference free-running agents. The principal defects of merely autonomous agents are uncontrollable preference drift and inability to effectively prevent reflectively consistent agents of different preference from taking over the future; only when reflective consistency is achieved, does the drift stop, and the preference extinction risk gets partially alleviated.

As with advanced AI, so is with humanity, there is danger in lack of reflective consistency. An autonomous agent, while not as dangerous as a reflectively consistent agent (though possibly still lethal), is a reflectively consistent agent with alien preference waiting to happen. Most autonomous agents would seek to construct a reflectively consistent agent with same preference, their own kind of FAI. A given autonomous agent can (1) drift from its original preference before becoming reflectively consistent, so that the end-result is different, (2) construct another different-preference autonomous non-reflective agent, which could eventually lead to a different-preference reflective agent, (3) fail at the construction of its FAI, creating a de novo reflectively-consistent agent of wrong preference; or, if all goes well, (4) succeed at building/becoming a reflectively consistent agent of same preference. Humanity faces these risks, and any non-reflective autonomous AI that we may develop in the future would add to them, even if this non-reflective AI shares our preference exactly at the time of construction. A proper Friendly AI has to be reflectively consistent from the start.
- rwallace 16 Mar 2010 23:06 UTC
  1 point
  Parent
  Disproof by counterexample: I don’t want to become reflectively consistent in the sense you’re using the phrase.
  
  Edit in response to your edit: the terms autonomous and reflectively consistent are used in the passage you quote to mean different things than you have been using them to mean.
  - Vladimir_Nesov 16 Mar 2010 23:11 UTC
    0 points
    Parent
    But what do you want? Whatever you want, it is an implicit consistent statement about all time, so the most general wish granted to you consists in establishing a reflectively consistent singleton that implements this statement during all of the future.
    - rwallace 16 Mar 2010 23:15 UTC
      0 points
      Parent
      For example, I would prefer that people not die, but if some people choose to die, I would not forcibly prevent them, nor would I license any other entity to initiate the use of force for that purpose, so no, I would not wish for a genie that always prevents people from dying no matter what.
      - Vladimir_Nesov 16 Mar 2010 23:22 UTC
        0 points
        Parent
        
        I would not wish for a genie that always prevents people from dying no matter what.
        
        What about genies that prevent people from dying conditionally on something, as opposed to always? It’s an artificial limitation you’ve imposed, the FAI can compute its ifs.
        rwallace 16 Mar 2010 23:35 UTC
        1 point
        Parent
        Like other people, I care not only about the outcome, but that it was not reached by unethical means; and am prepared to accept that I don’t have a unique ranking order for all outcomes, and that I may be mistaken in some of my preferences, and that I should be more tentative in areas where I am more likely to be mistaken.
        
        Could we aim, ultimately, to build an AGI with such properties? Yes indeed, and if we ever set out to build a self-willed AGI, that is how we should do it—precisely because it would have properties very different from those of the monomaniac utilitarian AGI postulated in most of what’s been written about friendly AI so far.
        Vladimir_Nesov 16 Mar 2010 23:38 UTC
        0 points
        Parent
        
        Yes indeed, and if we ever set out to build a self-willed AGI, that is how we should do it—precisely because it would have properties very different from those of the monomaniac utilitarian AGI postulated in most of what’s been written about friendly AI so far.
        
        Please pin it down: what are you talking about on both accounts (“how we should do it” and “the monomaniac utilitarian AGI”), and where do you place your interpretation of my concept of preference.
        rwallace 16 Mar 2010 23:43 UTC
        0 points
        Parent
        I can have a go at that, but a comment box in a thread buried multiple hidden layers down is a pretty cramped place to do it. Figure it’s appropriate for a top-level post? Or we could take it to one of the AGI mailing lists.
        Vladimir_Nesov 16 Mar 2010 23:54 UTC
        0 points
        Parent
        I meant to ask for a short indication of what you meant, long description will be a mistake, since you’ll misinterpret a lot of what I meant, given how little of the assumed ideas you agree with or understand the way they are intended.
        
        Signal to humbug ratio on AGI mailing lists is too low.
        rwallace 17 Mar 2010 2:45 UTC
        0 points
        Parent
        Well, I had been attempting to give short indications of what I meant already, but I’ll try. Basically, a pure utilitarian (if you could build such an entity of high intelligence, which you can’t) would be a monomaniac, willing to commit any crime in the service of its utility function. That means a ridiculous amount of weight goes onto writing the perfect utility function (which is impossible), and then in an attempt to get around that you end up with lunacy like CEV (which is, very fortunately, impossible), and the whole thing goes off the rails. What I’m proposing is that if anything like a self-willed AGI is ever built, it will have to be done in stages with what it does co-developed with how it does it, which means that by the time it’s being trusted with the capability to do something in the external world, it will already have all sorts of built-in constraints on what it does and how it does it, that will necessarily have been developed along with and be an integral part of the system. That’s the only way it can work (unless we stick to purely smart tool AI, which is also an option), and it means we don’t have to take an exponentially unlikely gamble on writing the perfect utility function.
        Nick_Tarleton 17 Mar 2010 15:45 UTC
        0 points
        Parent
        
        a pure utilitarian (if you could build such an entity of high intelligence, which you can’t)
        
        CEV (which is … impossible)
        
        Citations needed.
        Vladimir_Nesov 17 Mar 2010 10:04 UTC
        0 points
        Parent
        Well, I feel unable to effectively communicate with you on this topic (the fact that I persisted for so long is due to unusual mood, and isn’t typical—I’ve been answering all comments directed to me for the last day). Good luck, maybe you’ll see the light one day.