Wei Dai comments on Work on Security Instead of Friendliness?

Wei Dai 24 Jul 2012 16:36 UTC
5 points
I think I’m getting a better idea of where our disagreement is coming from. You think of external reality as some particular universe, and since we don’t have direct knowledge of what that universe is, we can only apply our utility function to models of it that we build using sensory input, and not to external reality itself. Is this close to what you’re thinking?

If so, I suggest that “valuing external reality” makes more sense if you instead think of external reality as the collection of all possible universes. I described this idea in more detail in my post introducing UDT.
- private_messaging 26 Jul 2012 4:03 UTC
  1 point
  Parent
  How would this assign utility to performing an experiment to falsify (drop probability of) some of the ‘possible worlds’ ? Note that such action decreases the sum of value over possible worlds by eliminating (decreasing weight of) some of the possible worlds.
  
  Please note that the “utility function” to which Nick Szabo refers is the notion that is part of the SI marketing pitch, and therein it alludes to the concept of utility from economics—which does actually make the agent value gathering information—and creates impression that this is a general concept applicable to almost any AI and something likely to be created by an AGI team unaware of the whole ‘friendliness’ idea; something that would be simple to make for paperclips; the world’s best technological genius of the future AGI creators being just a skill of making real the stupid wishes which need to be corrected by SI.
  
  Meanwhile, in the non-vague sense that you outline here, it appears much more dubious that anyone who does not believe in feasibility of friendliness would want to build this; it’s not even clear that anyone could. Meanwhile, an AI whose goal is only defined within a model based on physics as we know it, and lacking any sort of tie of that model to anything real—no value to keeping the model in sync with the world—is sufficient to build all that we need for mind uploading. Sensing is a very hard problem in AI, much more so for AGI.
  - Wei Dai 26 Jul 2012 19:14 UTC
    3 points
    Parent
    
    How would this assign utility to performing an experiment to falsify (drop probability of) some of the ‘possible worlds’ ?
    
    UDT would want to perform experiments so that it can condition its future outputs on the results of those experiments (i.e., give different outputs depending on how the experiments come out). This gives it higher utility without “falsifying” any of the possible worlds.
    
    Note that such action decreases the sum of value over possible worlds by eliminating (decreasing weight of) some of the possible worlds.
    
    The reason UDT is called “updateless” is that it doesn’t eliminate or change weight of any of the possible worlds. You might want to re-read the UDT post to better understand it.
    
    The rest of your comment makes some sense, but is your argument that without SI (if it didn’t exist), nobody else would try to make an AGI with senses and real-world goals? What about those people (like Ben Goertzel) who are currently trying to build such AGIs? Or is your argument that such people have no chance of actually building such AGIs at least until mind uploading happens first? What about the threat of neuromorphic (brain-inspired) AGIs as as we get closer to achieving uploading?
    - private_messaging 26 Jul 2012 20:24 UTC
      −1 points
      Parent
      
      The reason UDT is called “updateless” is that it doesn’t eliminate or change weight of any of the possible worlds. You might want to re-read the UDT post to better understand it.
      
      A particular instance of UDT running particular execution history got to condition on this execution history; you can say that you call conditioning what I call updates; in practice you will want not to run the computations irrelevant to the particular machine, and you will have strictly less computing power in the machine than in the universe it inhabits including the machine itself. It would be good if you could provide example of experimentations it might perform, somewhat formally derived. It feels to me that while it is valuable that you formalized some of the notions you largely have shifted/renamed all the actual problems.
      
      E.g. it is problematic to specify utility function on reality, its incoherent. In your case the utility function is specified on all mathematically representable theories, which may well not allow to actually value a paperclip. Plus the number of potential paperclips within a theory would grow larger than any computable function of size of the theory, and the actions may well be dominated by relatively small, but absolutely enormous, differences between huge theories. Can you make actual example of some utility function? It doesn’t have to correspond to paperclips—anything so that UDT with this plugged in would actually do something to our reality rather than the imaginary BusyBeaver(100) beings with imaginary dustspecks in their eyes which might be running a boxed sim of our world.
      
      With regards to Ben Goertzel, where does his AGI include anything like this not so vague utility function of yours? The marketing spiel in question is, indeed, that Ben Goertzel’s AI (or someone else’s) would maximize an utility function and kill everyone or something, which leads me to assume that they are not talking of your utility function.
      
      With regards to neuromorphic AGIs, I think there’s far too much science fiction and far too little understanding of neurology in the rationalization of ‘why am I getting paid’. While I do not doubt that brain does implement some sort of ‘master trick’ in, perhaps, every cortical column, there is an elaborate system for motivating this whole, and that system quite thoroughly fails to care about the real world state, in deed. And once again, why do you think neuromorphic AGIs would have the sort of values of real world as per UDT?
      
      edit: furthermore it seems fairly preposterous to assume high probability that your utility function will actually be implemented in a working manner—say, paperclip maximizing manner—by people who really need SI to tell them to beware of creating skynet. SI is the typical ‘high level idea guys’ with a belief that the tech guys much smarter than them in fact are specialized in lowly stuff and need the high level idea guys to provide philosophical guidance or else we all die. Incredibly common sight in startups that should never have started up (and fail invariably).
      - Wei Dai 27 Jul 2012 6:22 UTC
        0 points
        Parent
        
        With regards to Ben Goertzel, where does his AGI include anything like this not so vague utility function of yours?
        
        You seem to think that I’m claiming that UDT’s notion of utility function is the only way real-world goals might be implemented in an AGI. I’m instead suggesting that it is one way to do so. It currently seems to be the most promising approach for FAI, but I certainly wouldn’t say that only AIs using UDT can be said to have real-world goals.
        
        At this point I’m wondering if Nick’s complaint of vagueness was about this more general usage of “goals”. It’s unclear from reading his comment, but in case it is, I can try to offer a definition: an AI can be said to have real-world goals if it tries to (and generally succeeds at) modeling its environment and chooses actions based on their predicted effects on its environment.
        
        Goals in this sense seems to be something that AGI researchers actively pursue, presumably because they think it will make their AGIs more useful or powerful or intelligent. If you read Goertzel’s papers, he certainly talks about “goals”, “perceptions”, “actions”, “movement commands”, etc.
        private_messaging 27 Jul 2012 8:42 UTC
        1 point
        Parent
        
        You seem to think that I’m claiming that UDT’s notion of utility function is the only way real-world goals might be implemented in an AGI. I’m instead suggesting that it is one way to do so. It currently seems to be the most promising approach for FAI, but I certainly wouldn’t say that only AIs using UDT can be said to have real-world goals.
        
        Then you having formalized your utility function has nothing to do with allegations of vagueness when it comes to defining the utility in the argument of how utility maximizers are dangerous. With regards to it being ‘the most promising approach’, I think it is a very, very silly idea to have an approach so general that we all may well end up sacrificed in the name of huge number of imaginary beings that might exist, an AI pascal-wagering itself on it’s own. It looks like a dead end, especially for friendliness.
        
        At this point I’m wondering if Nick’s complaint of vagueness was about this more general usage of “goals”. It’s unclear from reading his comment, but in case it is, I can try to offer a definition: an AI can be said to have real-world goals if it tries to (and generally succeeds at) modeling its environment and chooses actions based on their predicted effects on its environment.
        
        This does necessarily work like ‘I want most paperclips to exist therefore I will talk my way into controlling the world, then kill everyone and make paperclips’, though.
        
        Goals in this sense seems to be something that AGI researchers actively pursue, presumably because they think it will make their AGIs more useful or powerful or intelligent. If you read Goertzel’s papers, he certainly talks about “goals”, “perceptions”, “actions”, “movement commands”, etc.
        
        They also don’t try to make goals that couldn’t be outsmarted into nihilism. We humans sort-of have a goal of reproduction, except we’re too clever, and we use birth control.
        
        In your UDT, the actual intelligent component is this mathematical intuition that you’d use to process this theory in reasonable time. The rest is optional and highly difficult (if not altogether impossible) icing, even for the most trivial goal such as paperclips, which may well in principle never work.
        
        And the technologies employed in the intelligent component are, without any of those goals, and with much less intelligence (as in computing power and their optimality) requirement, sufficient for e.g. using them to design machinery for mind uploading.
        
        Furthermore, and that is the most ridiculous thing, there is this ‘oracle AI’ being talked about, where an answering system is modelled as based on real world goals and real world utilities, as if those were somehow primal and universally applicable.
        
        It seems to me that the goals and utilities are just an useful rhetorical device used to trigger anthropomorphization fallacy at will (in a selective way), as to solicit donations.
        Wei Dai 27 Jul 2012 16:22 UTC
        0 points
        Parent
        
        They also don’t try to make goals that couldn’t be outsmarted into nihilism.
        
        They’re not explicitly trying to solve this problem because they don’t think it’s going to be a problem with their current approach of implementing goals. But suppose you’re right and they’re wrong, and somebody that wants to build a AGI ends up implementing a motivational system that outsmarts itself into nihilism. Well such an AGI isn’t very useful so wouldn’t they just keep trying until they stumble onto a motivational system that isn’t so prone to nihilism?
        
        We humans sort-of have a goal of reproduction, except we’re too clever, and we use birth control.
        
        Similarly, if we let evolution of humans continue, wouldn’t humans pretty soon have a motivational system for reproduction that we won’t want to cleverly work around?
        private_messaging 27 Jul 2012 17:09 UTC
        1 point
        Parent
        
        They’re not explicitly trying to solve this problem because they don’t think it’s going to be a problem with their current approach of implementing goals.
        
        They do not expect foom either.
        
        Well such an AGI isn’t very useful
        
        You can still have formally defined goals—satisfy conditions on equations, et cetera. Defined internally, without the problematic real world component. Use this for e.g. designing reliable cellular machinery (‘cure cancer and senescence’). Seems very useful to me.
        
        so wouldn’t they just keep trying until they stumble onto a motivational system that isn’t so prone to nihilism?
        
        How long would it take you to ‘stumble’ upon some goal for the UDT that translates to something actually real?
        
        Similarly, if we let evolution of humans continue, wouldn’t humans pretty soon have a motivational system for reproduction that we won’t want to cleverly work around?
        
        The evolution destructively tests designs against reality. Humans do have various motivational systems there, such as religion, btw.
        
        I am not sure how you think a motivational system for reproduction could work, so that we would not embrace a solution that actually does not result in reproduction. (Given sufficient intelligence)
        Wei Dai 27 Jul 2012 18:14 UTC
        1 point
        Parent
        
        They do not expect foom either.
        
        Goertzel does, or at least thinks it’s possible. See http://lesswrong.com/lw/aw7/muehlhausergoertzel_dialogue_part_1/ where he says “GOLEM is a design for a strongly self-modifying superintelligent AI system”. Also http://novamente.net/AAAI04.pdf where he talks about Novamente potentially being “thoroughly self-modifying and self-improving general intelligence”.
        
        You can still have formally defined goals—satisfy conditions on equations, et cetera.
        
        As I mentioned, there are AGI researchers trying to implement real-world goals right now. If they build an AGI that turns nihilistic, do you think they will just give up and start working on equation solvers instead, or try to “fix” their AGI?
        
        How long would it take you to ‘stumble’ upon some goal for the UDT that translates to something actually real?
        
        I guess probably not very long, if I had a working solution to “math intuition”, a sufficiently powerful computer to experiment with, and no concerns for safety...