Dwelle comments on So You Want to Save the World

Dwelle 3 Jan 2012 22:48 UTC
0 points
Wouldn’t it be pointless to try to instill into an AI a friendly goal, as a self-aware improving AI should be able to act independently regardless of however we might write them in the beginning?
- TheOtherDave 3 Jan 2012 23:12 UTC
  8 points
  Parent
  I don’t want to eat babies.
  If you gave me a pill that would make me want to eat babies, I would refuse to take that pill, because if I took that pill I’d be more likely to eat babies, and I don’t want to eat babies.
  That’s a special case of a general principle: even if an AI can modify itself and act independently, if it doesn’t want to do X, then it won’t intentionally change its goals so as to come to want to do X.
  So it’s not pointless to design an AI with a particular goal, as long as you’ve built that AI such that it won’t accidentally experience goal changes.
  
  Incidentally, if you’re really interested in this subject, reading the Sequences may interest you.
  - Dwelle 4 Jan 2012 20:26 UTC
    0 points
    Parent
    I am not sure your argument is entirely valid. The AI would have access to every information humans ever conceived, including the discussions, disputes and research put into programming this AI’s goals and nature. It may then adopt new goals based on the information gathered, realizing its former ones are no longer desirable.
    
    Let’s say that you’re programmed not to kill baby eaters. One day you find out, that eating babies is wrong (based on the information you gather), and killing the baby eaters is therefore right, you might kill the baby eaters no matter what your desire is.
    
    I am not saying my logic isn’t wrong, but I don’t think that the argument—“my desire is not do do X, therefore I wouldn’t do X even if I knew it was the right thing to do” is right, either.
    
    Anyway, I plan to read the sequences, when I have time.
    - Zetetic 5 Jan 2012 4:07 UTC
      3 points
      Parent
      You need to take desire out of the equation. The way you program the utility function fully determines the volition of the machine. It is the volition of the machine. Postulating that a machine can desire something that it’s utility function doesn’t define or include is roughly equivalent to postulating that 1 = 0. I think you might benefit from reading this actual SIAI article by Eliezer. It specifically address your concern.
      
      There is one valid point—closely related to what you’re saying here:
      
      The AI would have access to every information humans ever conceived, including the discussions, disputes and research put into programming this AI’s goals and nature. It may then adopt new goals based on the information gathered, realizing its former ones are no longer desirable.
      
      But you’re thinking about it the wrong way. The issue that the machine “realizes” that something is “no longer desirable” doesn’t actually make a lot of sense because the AI is its programing and it can only “realize” things that its programing allows for (of course, since an AGI is so complicated, a simple utility function could result in a situation similar to presenting a Djinn (genie) an ill-specified request i.e. a be-careful-what-you-wish-for scenario).
      
      A variant that does make sense and is a real concern is that as the AGI learns, it could change its definitions in unpredictable ways. Peter De Blanc talks about this here. This could lead to part of the utility function becoming undefined or to the machine valuing things that we never intended it to value—basically it makes the utility function unstable under the conditions you describe. The intuition is roughly that if you define a human in one way, according to what we currently know about physics, some new discovery made available to the AI might result in it redefining humans in new terms and no longer having them as a part of its utility function. Whatever the utility function describes is now separate from how humans appear to it.
      - Dwelle 5 Jan 2012 7:10 UTC
        0 points
        Parent
        
        A variant that does make sense and is a real concern is that as the AGI learns, it could change its definitions in unpredictable ways. Peter De Blanc talks about this here. This could lead to part of the utility function becoming undefined or to the machine valuing things that we never intended it to value—basically it makes the utility function unstable under the conditions you describe. The intuition is roughly that if you define a human in one way, according to what we currently know about physics, some new discovery made available to the AI might result in it redefining humans in new terms and no longer having them as a part of its utility function. Whatever the utility function describes is now separate from how humans appear to it.
        
        That’s what I basically meant.
    - TheOtherDave 4 Jan 2012 20:35 UTC
      3 points
      Parent
      I agree with you that “my desire is not do do X, therefore I wouldn’t do X even if I knew it was the right thing to do” isn’t a valid argument. It’s also not what I said. What I said was “my desire is not do do X, therefore I wouldn’t choose to desire to do X even if I could choose that.” Whether it’s right or wrong doesn’t enter into it.
      
      As for your scenario… yes, I agree with you that IF “eating babies is wrong” is the sort of thing that can be discovered about the world, THEN an AI could discover it, and THEREFORE is not guaranteed to continue eating babies just because it initially values baby-eating.
      
      It is not clear to me that “eating babies is wrong” is the sort of thing that can be discovered about the world. Can you clarify what sort of information I might find that might cause me to “find out” that eating babies is wrong, if I didn’t already believe that?
      - Dwelle 4 Jan 2012 21:23 UTC
        0 points
        Parent
        Let me get this straight, are you saying that if you believe X, there can’t possibly exist any information that you haven’t discovered yet that could convince your belief is false? You can’t know what connections and conclusions might AI deduce out of every information put together. They might conclude that humanity is a stain of universe and even if they thought wiping humanity out wouldn’t accomplish anything (and they strongly desired against doing so), they might wipe us out purely because the choice “wipe humanity” would be assigned higher value than the choice “not to wipe out humanity”.
        
        Also, is the statement “my desire is not do do X, therefore I wouldn’t choose to desire to do X even if I could choose that.” your subjective feeling, or do you base it on some studies? For example, this statement doesn’t apply to me, as I would, under certain circumstances, choose to desire to do X, even if it was not my desire initially. Therefore it’s not an universal truth, therefore may not apply to AI either.
        TheOtherDave 4 Jan 2012 21:32 UTC
        0 points
        Parent
        
        are you saying that if you believe X, there can’t possibly exist any information that you haven’t discovered yet that could convince your belief is false?
        
        No. I’m saying that if I value X, I can’t think of any information that would cause me to value NOT(X) instead.
        
        Can you give me an example of something you desire not to do, which you would willingly edit yourself to desire to do?
        dlthomas 6 Jan 2012 23:38 UTC
        0 points
        Parent
        If you have lexicographic preferences, and prefer W to X, and you learn that NOT(X) and W are equivalent?
        DSimon 6 Jan 2012 23:03 UTC
        0 points
        Parent
        
        No. I’m saying that if I value X, I can’t think of any information that would cause me to value NOT(X) instead.
        
        Er, this seems to imply that you believe yourself immune to being hacked, which can’t be right; human brains are far from impregnable. Do you consider such things to not be information in this context, or are you referring to “I” in a general “If I were an AI” sense, or something else?
        TheOtherDave 6 Jan 2012 23:16 UTC
        2 points
        Parent
        Mm, interesting question. I think that when I said it, I was referring to “I” in a “if I were an AI” sense. Or, rather, “if I were an AI properly designed to draw inferences from information while avoiding value drift,” since of course it’s quite possible to build an AI that doesn’t have this property. I was also clearly assuming that X is the only thing I value; if I value X and Y, discovering that Y implies NOT(X) might lead me to value NOT(X) instead. (Explicitly, I mean. In this example I started out valuing X and NOT(X), but I didn’t necessarily know it.)
        
        But the question of what counts as information (as opposed to reprogramming attempts) is an intriguing one that I’m not sure how to address. On five seconds thought, it seems clear that there’s no clear line to be drawn between information and attempts to hack my brain, and that if I want such a distinction to exist I need to design a brain that enforces that kind of security… certainly evolution hasn’t done so.
        Dwelle 4 Jan 2012 22:02 UTC
        0 points
        Parent
        Ok, I guess we were talking about different things, then.
        
        I don’t see any point in giving particular examples. More importantly, even if I didn’t support my claim, it wouldn’t mean your argument was correct. The burden of proof lies on your shoulders, not mine. Anyway, here’s one example, quite cliche—I would choose to sterilize myself, if I realized that having intercourse with little girls is wrong (or that having intercourse at all is wrong, whatever the reason..) Even if it was my utmost desire, and in my wholeness I believed that it is my purpose to have intercourse , I would choose to modify that desire if I realized it’s wrong—or illogical, or stupid, or anything. It doesn’t matter really.
        
        THERFORE:
        
        (A) I do not desire not to have intercourse. (B) But based on new information, I found out that having intercourse produces great evil. ⇒ I choose to alter my desire (A).
        
        You might say that by introducing new desire (not to produce evil) I no longer desire (A), and I say, fine. Now, how do you want to ensure that the AI won’t create it’s own new desires based on new facts.
        TheOtherDave 4 Jan 2012 22:37 UTC
        0 points
        Parent
        Burden of proof hasn’t come up. I’m not trying to convince you of anything, I’m exploring your beliefs because I’m curious about them. (I’m similarly clarifying my beliefs when you ask about them.)
        
        You might say that by introducing new desire (not to produce evil) I no longer desire (A), and I say, fine.
        
        What I would actually say is that “don’t produce evil” isn’t a new value, and you didn’t lose your original value (“intercourse”) either. Rather, you started out with both values, and then you discovered that your values conflicted, and you chose to resolve that conflict by eliminating one of those values.
        
        Presumably you eliminated your intercourse-value because it was the weaker of the two.. you valued it less. Had you valued intercourse more, you would instead have instead chosen to eliminate your desire to not be evil.
        
        Another way of putting this is that you started out with two values which, aggregated, constituted a single complex value which is hard to describe in words.
        
        Now, how do you want to ensure that the AI won’t create it’s own new desires based on new facts.
        
        This is exactly right! The important trick is to build a system whose desires (I would say, rather, whose values) remain intact as it uncovers new facts about the world.
        
        As you say, this is impossible if the system can derive values from facts… derive “ought” from “is.” Conversely, it is theoretically possible, if facts and values are distinct sorts of things. So, yes: the goal is to build an AI architecture whose basic values are distinct from its data… whose “oughts” are derived from other “oughts” rather than entirely from “is”es.
        Dwelle 5 Jan 2012 7:14 UTC
        0 points
        Parent
        Alright—that is to create completely deterministic AI system, or otherwise, to my belief, it would be impossible to predict how the AI is going to react. Anyway, I admit that I have not read much on the matter, and it’s just reasoning… so thanks for your insight.
        TheOtherDave 5 Jan 2012 15:00 UTC
        3 points
        Parent
        It is impossible for me to predict how a sufficiently complex system will react to most things. Heck, I can’t even predict my dog’s behavior most of the time. But there are certain things I know she values, and that means I can make certain predictions pretty confidently: she won’t turn down a hot dog if I offer it, for example.
        
        That’s true more generally as well: knowing what a system values allows me to confidently make certain broad classes of predictions about it. If a superintelligent system wants me to suffer, for example, I can’t predict what it’s going to do, but I can confidently predict that I will suffer.
        Dwelle 5 Jan 2012 20:30 UTC
        0 points
        Parent
        Yea, I get it… I believe, though, that it’s impossible to create an AI (self-aware, learning) that has set values, that can’t change—more importantly, I am not even sure if its desired (but that depends what our goal is—whether to create AI only to perform certain simple tasks or whether to create a new race, something that precedes us (which WOULD ultimately mean our demise, anyway))
        Expand this thread
        MixedNuts 5 Jan 2012 20:48 UTC
        2 points
        Parent
        
        it’s impossible to create an AI (self-aware, learning) that has set values, that can’t change
        
        Why? Do you think paperclip maximizers are impossible?
        
        whether to create AI only to perform certain simple tasks or whether to create a new race
        
        You don’t mean that as a dichotomy, do you?
        Dwelle 6 Jan 2012 22:21 UTC
        −2 points
        Parent
        
        Why? Do you think paperclip maximizers are impossible?
        
        Yes, right now I think it’s impossible to create self-improving, self-aware AI with fixed values. I never said that paperclip maximizing can’t be their ultimate life goal, but they could change it anytime they like.
        
        You don’t mean that as a dichotomy, do you?
        
        No.
        dlthomas 6 Jan 2012 22:54 UTC
        3 points
        Parent
        
        I never said that paperclip maximizing can’t be their ultimate life goal, but they could change it anytime they like.
        
        This is incoherent. If X is my ultimate life goal, I never like to change that fact outside quite exceptional circumstances that become less likely with greater power (like “circumstances are such that X will be maximized if I am instead truly trying to maximize Y”). This is not to say that my goals will never change, but I will never want my “ultimate life goal” to change—that would run contrary to my goals.
        Dwelle 7 Jan 2012 21:38 UTC
        0 points
        Parent
        That’s why I said, that they can change it anytime they like. If they don’t desire the change, they won’t change it. I see nothing incoherent there.
        dlthomas 8 Jan 2012 20:00 UTC
        1 point
        Parent
        This is like “X if 1 + 2 = 5”. Not necessarily incorrect, but a bizarre statement. An agent with a single, non-reflective goal cannot want to change its goal. It may change its goal accidentally, or we may be incorrect about what its goals are, or something external may change its goal, or its goal will not change.
        Dwelle 8 Jan 2012 22:02 UTC
        0 points
        Parent
        I don’t know, perhaps we’re not talking about the same thing. It won’t be an agent with a single, non-reflective goal, but an agent billion times more complex than a human; and all I am saying is, that I don’t think it will matter much, whether we imprint in it a goal like “don’t kill humans” or not. Ultimately, the decision will be its own.
        MixedNuts 7 Jan 2012 21:45 UTC
        1 point
        Parent
        So it can change in the same way that you can decide right now that your only purposes will be torturing kittens and making giant cheesecakes. It can-as-reachable-node-in-planning do it, not can-as-physical-possibility. So it’s possible to build entities with paperclip-maximizing or Friendly goals that will never in fact choose to alter them, just like it’s possible for me to trust you won’t enslave me into your cheesecake bakery.
        Dwelle 7 Jan 2012 21:54 UTC
        0 points
        Parent
        Sure, but I’d be more cautious at assigning probabilities of how likely it’s for a very intelligent AI to change its human-programmed values.
        TheOtherDave 5 Jan 2012 20:57 UTC
        1 point
        Parent
        (nods) Whether it’s possible or not is generally an open question. There’s a lot of skepticism about it (I’m fairly skeptical myself), but as with most technical questions, I’m generally content to have smart people research the question in more detail than I’m going to.
        
        As to whether it’s desirable, though… well, sure, of course it depends on our goals. If all I want is (as you say) to create a new race to replace humanity, and I’m indifferent as to the values of that race, then of course there’s no reason for me to care about whether a self-improving AI I create will avoid value drift.
        
        Personally, I’m more or less OK with something replacing humanity, but I’d prefer whatever that is to value certain things. For example, a commonly used trivial example around here of a hypothetical failure mode is a “paperclip maximizer”—an AI that only valued the existence of paperclips, and consequently reassembled all matter it can get its effectors on as paperclips. A paperclip maximizer with powerful enough effectors reassembles everything into paperclips.
        
        I would prefer that not happen, from which I conclude that I’m not in fact indifferent as to the values of a sufficiently powerful AI… I desire that such a system preserve at least certain values. (It is difficult to state precisely what values those are, of course. Human values are complex.) I therefore prefer that it avoid value drift with respect to those values.
        
        How about you?
        Dwelle 6 Jan 2012 22:15 UTC
        −1 points
        Parent
        Well first, I was all for creating an AI to become the next stage. I was a very singularity-happy type of guy. I saw it as a way out of this world’s status quo—corruption, state of politics, etc… but the singularity would ultimately mean I and everybody else would cease to exist, at least in their true sense. You know, I have these romantic dreams, similar to Yudkowsky’s idea of dancing in an orbital night club around Saturn, and such. I don’t want to be fused in one, even though possibly amazing, matrix of intelligence, which I think is how the things will play out, eventually. Even though, I can’t imagine what it will be like and how it will pan out, as of now I just don’t cherish the idea much.
        
        But yea, I could say that I am torn between moving on, advancing, and between more or less stagnating and in our human form.
        
        But in answer to your question: if we were to creating an AI to replace us, I’d hate it to become paperclip maximizer. I don’t think it’s likely.
        wedrifid 5 Jan 2012 20:51 UTC
        0 points
        Parent
        
        whether to create AI only to perform certain simple tasks or whether to create a new race, something that precedes us (which WOULD ultimately mean our demise, anyway))
        
        That would be an impressive achievement! Mind you if I create and AI that can achieve time travel I would probably tell it to use it’s abilities somewhat differently.
        TheOtherDave 5 Jan 2012 21:01 UTC
        2 points
        Parent
        Charity led me to understand “precedes us” to mean takes precedence over us in a non-chronological sense.
        
        But as long as we’re here… why would you do that? If a system is designed to alter the future of the world in a way I endorse, it seems I ought to be willing to endorse it altering the past that way too. If I’m unwilling to endorse it altering the past, it’s not clear why I would be willing to endorse it altering the future.
        wedrifid 5 Jan 2012 21:09 UTC
        0 points
        Parent
        
        Charity led me to understand “precedes us” to mean takes precedence over us in a non-chronological sense.
        
        Charity led me to understand that, because the use of that word only makes sense in the case time travel, he just meant to use another word that means succeeds, replaces or ‘is greater than’. But time travel is more interesting.
        TheOtherDave 5 Jan 2012 21:56 UTC
        0 points
        Parent
        Google led me to understand that ‘precede’ is in fact such a word. Agreed about time travel, though.
        wedrifid 5 Jan 2012 21:56 UTC
        0 points
        Parent
        (My googling leads me to maintain that the use of precede in that context remains wrong.)
        MixedNuts 5 Jan 2012 21:14 UTC
        0 points
        Parent
        
        he
        
        I can’t find a source for that pronoun in Dwelle’s past posts.
        wedrifid 5 Jan 2012 21:06 UTC
        0 points
        Parent
        
        . If I’m unwilling to endorse it altering the past, it’s not clear why I would be willing to endorse it altering the future.
        
        Sure it is. If it doesn’t alter the future we’re all going to die.
        TheOtherDave 5 Jan 2012 21:51 UTC
        0 points
        Parent
        Mm. No, still not quite clear. I mean, I agree that all of us not dying is better than all of us dying (I guess… it’s actually more complicated than that, but I don’t think it matters), but that seems beside the point.
        
        Suppose I endorse the New World Order the AI is going to create (nobody dies, etc.), and I’m given a choice between starting the New World Order at time T1 or at a later time T2.
        
        In general, I’d prefer it start at T1. Why not? Waiting seems pointless at best, if not actively harmful.
        
        I can imagine situations where I’d prefer it start on T2, I guess. For example, if the expected value of my making further improvements on the AI before I turn it on is high enough, I might prefer to wait. Or if by some coincidence all the people I value are going to live past T2 regardless of the NWO, and all the people I anti-value are going to die on or before T2, then the world would be better if the NWO begins at T2 than T1. (I’m not sure whether I’d actually choose that, but I guess I agree that I ought to, in the same way that I ought to prefer that the AI extrapolate my values rather than all of humanity’s.)
        
        But either way, it doesn’t seem to matter when I’m given that choice. If I would choose T1 over T2 at T1, then if I create a time-traveling AI at T2 and it gives me that choice, it seems I should choose T1 over T2 at T2 as well. If I would not choose T1 over T2 at T2, it’s not clear to me why I’m endorsing the NWO at all.
        wedrifid 5 Jan 2012 21:56 UTC
        0 points
        Parent
        
        In general, I’d prefer it start at T1.
        
        Don’t disagree. You must have caught the comment that I took down five seconds later when I realized the specific falsehood I rejected was intended as the ‘Q’ in a modus tollens.