Vaniver comments on Reply to Holden on ‘Tool AI’

Vaniver 14 Jun 2012 1:02 UTC
2 points
Right- I think the issue is more that I (at least) view the AI as operating entirely in ring 3. It might be possible to code one where the utility function is ring 0, I/O is ring 1, and action-plans are ring 3, but for those distinctions to be meaningful they need to resist bad self-modifying and allow good self-modification.

For example, we might say “don’t make any changes to I/O drivers that have a massively positive effect on the utility function” to make it so that the AI can’t hallucinate its reward button being pressed all the time. But how do we differentiate between that and it making a change in ring 3 from a bad plan to a great plan, that results in a massive increase in reward?
- Eliezer Yudkowsky 14 Jun 2012 2:23 UTC
  13 points
  Parent
  Suppose your utility function U is in ring 0 and the parts of you that extrapolate consequences are in ring 3. If I can modify only ring 3, I can write my own utility function Q, write ring-3 code that first extrapolates consequences fairly, pick the one that maximizes Q, and then provides a “prediction” to ring 0 asserting that the Q-maximizing action has consequence X that U likes, while all other actions have some U-disliked or neutral consequence. Now the agent has been transformed from a U-maximizer to a Q-maximizer by altering only ring 3 code for “predicting consequences” and no code in ring 0 for “assessing utilities”.
  
  One would also like to know what happens if the current AI, instead of “self”-modifying, writes a nearly-identical AI running on new hardware obtained from the environment.
  - Vaniver 15 Jun 2012 0:58 UTC
    0 points
    Parent
    
    Suppose your utility function U is in ring 0 and the parts of you that extrapolate consequences are in ring 3.
    
    Sure; that looks like the hallucination example I put forward, except in the prediction instead of the sensing area. My example was meant to highlight that it’s hard to get a limitation with high specificity, and not touch the issue of how hard it is to get a limitation with high sensitivity. (I find that pushing people in two directions is more effective at communicating difficulty than pushing them in one direction.)
    
    The only defense I’ve thought of against those sorts of hallucinations is a “is this real?” check that feeds into the utility function- if the prediction or sensation module fails some test cases, then utility gets cratered. It seems too weak to be useful: it only limits the prediction / sensation module when it comes to those test cases, and a particularly pernicious modification would know what the test cases are, leave them untouched, and make everything else report Q-optimal predictions. (This looks like it turns into a race / tradeoff game between testing to keep the prediction / sensation software honest and the costs of increased testing, both in reduced flexibility and spent time / resources. And the test cases might be vulnerable, and so on.)
- Strange7 14 Jun 2012 1:45 UTC
  2 points
  Parent
  I don’t think the utility function should be ring 0. Utility functions are hard, and ring zero is for stuff where any slip-up crashes the OS. Ring zero is where you put the small, stupid, reliable subroutine that stops the AI from self-modifying in ways that would make it unstable, or otherwise expanding it’s access privileges in inappropriate ways.
  - Eliezer Yudkowsky 14 Jun 2012 2:18 UTC
    −1 points
    Parent
    I’d like to know what this small subroutine looks like. You know it’s small, so surely you know what’s in it, right?
    - wedrifid 14 Jun 2012 3:09 UTC
      5 points
      Parent
      
      I’d like to know what this small subroutine looks like. You know it’s small, so surely you know what’s in it, right?
      
      Doesn’t actually follow. ie. Strange7 is plainly wrong but this retort still fails.
      - thomblake 19 Jun 2012 20:00 UTC
        1 point
        Parent
        It doesn’t follow necessarily, but Eliezer has justified skepticism that someone who doesn’t know what’s in the subroutine would have good reason to say that it’s small.
        wedrifid 19 Jun 2012 20:10 UTC
        13 points
        Parent
        
        It doesn’t follow necessarily, but Eliezer has justified skepticism that someone who doesn’t know what’s in the subroutine would have good reason to say that it’s small.
        
        He knows that there is no good reason (because it is a stupid idea) so obviously Strange can’t know a good reason. That leaves the argument as the lovechild of hindsight bias and dark-arts rhetorical posturing.
        
        I probably wouldn’t have comment if I didn’t notice Eliezer making a similar error in the opening post, significantly weakening the strength of his response to Holden.
        
        If Holden says there’s 90% doom probability left over no matter what sane intelligent people do (all of which goes away if you just build Google Maps AGI, but leave that aside for now) I would ask him what he knows now, in advance, that all those sane intelligent people will miss. I don’t see how you could (well-justifiedly) access that epistemic state.
        
        I expect much, much better than this from Eliezer. It is quite possibly the dumbest thing I have ever heard him say and the subject of rational thinking about AI is supposed to be pretty much exactly his area of expertise.
        What links here?
        wedrifid's comment on Reply to Holden on The Singularity Institute by lukeprog (10 Jul 2012 1:39 UTC; 4 points)
        Eliezer Yudkowsky 20 Jun 2012 20:06 UTC
        5 points
        Parent
        Not all arguing aimed at people with different premises is Dark Arts, y’know. I wouldn’t argue from the Bible, sure. But trying to make relatively vague arguments accessible to people in a greater state of ignorance about FAI, even though I have more specific knowledge of the issue that actually persuades me of the conclusion I decided to argue? I don’t think that’s Dark, any more than it’s Dark to ask a religious person “How could you possibly know about this God creature?”, when you’re actually positively convinced of God’s nonexistence by much more sophisticated reasoning like the general argument against supernaturalism as existing in the model but not the territory. The simpler argument is valid—it just uses less knowledge to arrive at a weaker version of the same conclusion.
        
        Likewise my reply to Strange; yes, I secretly know the problem is hard for much more specific reasons, but it’s also valid to observe that if you don’t know how to make the subroutine you don’t know that it’s small, and this can be understood with much less explanation, albeit it reaches a weaker form of the conclusion.
        wedrifid 21 Jun 2012 2:07 UTC
        5 points
        Parent
        
        Not all arguing aimed at people with different premises is Dark Arts, y’know.
        
        Of course not. The specific act of asking rhetorical questions where the correct answer contradicts your implied argument is a Dark Arts tactic, in fact it is pretty much the bread-and-butter “Force Choke” of the Dark Arts. In most social situations (here slightly less than elsewhere) it is essentially impossible to refute such a move, no matter how incoherent it may be. It will remain persuasive because you burned the other person’s status somewhat and at the very best they’ll be able to act defensive. (Caveat: I do not use “Dark Arts” as an intrinsically negative normative judgement. Dark arts is more of natural human behavior than reason is and our ability to use sophisticated Dark Arts rather cruder methods is what made civilization possible.)
        
        Also, it just occurred to me that in the Star Wars universe it is only the Jedi’s powers that are intrinsically “Dark Arts” in our sense (ie. the “Jedi Mind Trick”). The Sith powers are crude and direct—“Force Lightening”, “Force Choke”, rather than manipulative persuasion. Even Sideous in his openly Sith form uses far less “Persuading Others To Have Convenient Beliefs Irrespective Of ‘Truth’” than he does as the plain politician Palpatine. Yet the audience considers Jedi powers so much more ‘good’ than the Sith ones and even considers Sith powers worse than blasters and space cannons.
        Eliezer Yudkowsky 21 Jun 2012 2:53 UTC
        2 points
        Parent
        I’m genuinely unsure what you’re talking about. I presume the bolded quote is the bad question, and the implied answer is “No, you can’t get into an epistemic state where you assign 90% probability to that”, but what do you think the correct answer is? I think the implied answer is true.
        wedrifid 21 Jun 2012 3:31 UTC
        3 points
        Parent
        A closely related question: You clearly have reasons to believe that a non-Doom scenario is likely (at least likely enough for you to consider the 90% Doom prediction to be very wrong). This is as opposed to thinking that Doom is highly likely but that trying anyway is still the best chance. Luke has also updated in that general direction, likely for reason that overlap with yours.
        
        I am curious as to whether this reasoning is of the kind that you consider yourself able to share. Equivalently, is the reasoning you use to become somewhat confident in FAI chance of success something that you haven’t shared due to the opportunity cost associated with the effort of writing it up or is it something that you consider safer as a secret?
        
        I had previously guessed that it was a “You Can’t Handle The Truth!” situation (ie. most people do not multiply then shut up and do the impossible so would get the wrong idea). This post made me question that guess.
        
        Please pardon the disrespect entailed in asserting that you are either incorrectly modelling the evidence Holden has been exposed to or that you are incorrectly reasoning about how he should reason.
        What links here?
        wedrifid's comment on Reply to Holden on ‘Tool AI’ by Eliezer Yudkowsky (21 Jun 2012 4:48 UTC; 7 points)
        Eliezer Yudkowsky 21 Jun 2012 13:36 UTC
        3 points
        Parent
        I’ve tried to share the reasoning already. Mostly it boils down to “the problem is finite” and “you can recurse on it if you actually try”. Certainly it will always sound more convincing to someone who can sort-of see how to do it than to someone who has to take someone else’s word for it, and to those who actually try to build it when they are ready, it should feel like solider knowledge still.
        Expand this thread
        h-H 24 Jun 2012 21:41 UTC
        2 points
        Parent
        hmm, I have to ask, are you deliberately vague about this to sort for those who can grok your style of argument, in the belief that the sequences are enough for them to reach the same confidence you have about a FAI scenario?
        shokwave 24 Jun 2012 21:49 UTC
        5 points
        Parent
        
        are you deliberately vague
        
        Outside of postmodernism, people are almost never deliberately vague: they think they’re over specifying, in painfully elaborate detail, but thank to the magic of inferential distance it comes across as less information than necessary to the listener. The listener then, of course, also expects short inferential distance, and assumes that the speaker is deliberately being vague, instead of noticing that actually there’s just a lot more to explain.
        h-H 24 Jun 2012 22:08 UTC
        1 point
        Parent
        Yes, and this is why I asked in the first place. To be more exact, I’m confused as to why Eliezer does not post a step-by-step detailing how he reached the particular confidence he currently holds as opposed to say, expecting it to be quite obvious.
        
        I believe people like Holden especially would appreciate this; he gives an over 90% confidence to an unfavorable outcome, but doesn’t explicitly state the concrete steps he took to reach such a confidence.
        
        Maybe Holden had a gut feeling and threw a number, if so, isn’t it more beneficial for Eliezer to detail how he personally reached the confidence level he has for a FAI scenario occurring than to bash Holden for being unclear?
        shokwave 24 Jun 2012 22:55 UTC
        0 points
        Parent
        I don’t believe I can answer these questions correctly (as I’m not Eliezer and these questions are very much specific to him); I was already reaching a fair bit with my previous post.
        h-H 24 Jun 2012 23:04 UTC
        0 points
        Parent
        I’m happy you asked, I did need to make my argument more specific.
        A1987dM 25 Jun 2012 18:04 UTC
        −1 points
        Parent
        
        Outside of postmodernism, people are almost never deliberately vague
        
        Aren’t they? Lots of non-postmodern poets are sometimes deliberately vague. I am often deliberately vague.
        MarkusRamikin 25 Jun 2012 18:08 UTC
        −1 points
        Parent
        
        I am often deliberately vague.
        
        That clearly shows postmodernist influence. ;)
        Eliezer Yudkowsky 26 Jun 2012 7:44 UTC
        4 points
        Parent
        Again, I’ve tried to share it already in e.g. CEV. I can’t be maximally specific in every LW comment.
        Manfred 24 Jun 2012 22:25 UTC
        0 points
        Parent
        My unpacking, which may be different than intended:
        
        The “you can recurse on it” part is the important one. “Finite” just means it’s possible to fill a hard drive with the solution.
        
        But if you don’t know the solution, what are the good ways to get that hard drive? What skills are key? This is recursion level one.
        
        What’s a good way to acquire the skills that seem necessary (as outlined in level one) to solve the problem? How can you test ideas about what’s useful? That’s recursion level two.
        
        And so on, with stuff like “how can we increase community involvement in level 2 problems?” which is a level 4 question (community involvement is a level 3 solution to the level 2 problems). Eventually you get to “How do I generate good ideas? How can I tell which ideas are good ones?” which is at that point unhelpful because it’s the sort of thing you’d really like to already know so you can put it on a hard drive :D
        
        To solve problems by recursing on them, you start at level 0, which is “what is the solution?” If you know the answer, you are done. If you don’t know the answer, you go up a level—“what is a good way to get the solution?” If you know the answer, you go down a level and use it. If you don’t know the answer, you go up a level.
        
        So what happens is that you go up levels until you hit something you know how to do, and then you do it, and you start going back down.
        wedrifid 21 Jun 2012 3:23 UTC
        1 point
        Parent
        
        but what do you think the correct answer is? I think the implied answer is true.
        
        I would say with fairly high confidence that he can assign 90% probability to that and that his doing so is a fairly impressive effort in avoiding the typical human tendency toward overconfidence. I would be highly conducive to being persuaded that the actual probability given what you know is less than 90% - even hearing you give implied quantitative bounds in this post changed my mind in the direction of optimism. However given what he is able to know (including his not-knowing of logical truths due to bounded computation) his predominantly outside view estimate seems like an appropriate prediction.
        
        It is actually only Luke’s recent declaration that access to some of your work increased his expectation that FAI success (and so non-GAI doom) is possible that allowed me to update enough that I don’t consider Holden to be erring slightly on the optimistic side (at least relative to what I know).
        Eliezer Yudkowsky 21 Jun 2012 3:43 UTC
        1 point
        Parent
        This sounds like you would tend to assign 90% irreducible doom probability from the best possible FAI effort. What do you think you know, and how do you think you know it?
        Expand this thread
        wedrifid 21 Jun 2012 4:48 UTC
        7 points
        Parent
        
        This sounds like you would tend to assign 90% irreducible doom probability from the best possible FAI effort.
        
        While incorrect this isn’t an unreasonable assumption—most people who make claims similar to what I have made may also have that belief. However what I have said is about what Holden believed given what he had access to and to a lesser extent, what I believed prior to reading your post. I’ve mentioned that your post constitutes significant previously unheard information about your position. I update on that kind of evidence even without knowing the details. Holden can be expected to update too but he should (probably) update less given what he knows, which relies a lot on knowledge of cause based organisations and how the people within them think.
        
        What do you think you know, and how do you think you know it?
        
        A far from complete list of things that I knew and still know is:
        
        It is possible to predict human failure without knowing exactly how they will fail.
        I don’t know what an O-ring is (I guess it is a circle with a hole in it). I don’t know the engineering details of any of the other parts of a spacecraft either. I would still assign a significantly greater than epsilon probability for any given flight failing catastrophically despite knowing far less than what the smartest people in the field know. That kind of thing is hard.
        GAI is hard.
        FAI is harder.
        Both of those tasks are probably harder than anything humans have ever done.
        Humans have failed at just about everything significant they tried the first time.
        Humans fail at stuff even when they try really, really hard.
        Humans are nearly universally too optimistic when they are planning their activities.
        
        Those are some of the things I know, and illustrate in particular why I was shocked by this question:
        
        I would ask him what he knows now, in advance, that all those sane intelligent people will miss.
        
        Why on earth would you expect that Holden would know in advance what all those sane intelligent people would miss? If Holden already knew that he could just email them and they would fix it. Not knowing the point of failure is the problem.
        
        I am still particularly interested in this question. It is a boolean question and shouldn’t be too difficult or status costly to answer. If what I know and why I think I know it are important it seems like knowing why I don’t know more could be too.
        Eliezer Yudkowsky 21 Jun 2012 13:28 UTC
        13 points
        Parent
        GAI is indeed hard and FAI is indeed substantially harder. (BECAUSE YOU HAVE TO USE DIFFERENT AGI COMPONENTS IN AN AI WHICH IS BEING BUILT TO COHERENT NARROW STANDARDS, NOT BECAUSE YOU SIT AROUND THINKING ABOUT CEV ALL DAY. Bolded because a lot of people seem to miss this point over and over!)
        
        However, if you haven’t solved either of these problems, I must ask you how you know that it is harder than anything humans have ever done. It is indeed different from anything humans have ever done, and involves some new problems relative to anything humans have ever done. I can easily see how it would look more intimidating than anything you happened to think of comparing it to. But would you be scared that nine people in a basement might successfully, by dint of their insight, build a copy of the Space Shuttle? Clearly I stake quite a lot of probability mass on the problem involving less net labor than that, once you know what you’re doing. Again, though, the key insight is just that you don’t know how complex the solution will look in retrospect- as opposed to how intimidating the problem is to stare at unsolved—until after you’ve solved it. We know nine people can’t build a copy of a NASA-style Space Shuttle (at least not without nanotech) because we know how to build one.
        
        Suppose somebody predicted with 90% probability that the first manned Space Shuttle launch would explode on the pad, even if Richard Feynman looked at it and signed off on the project, because it was big and new and different and you didn’t see how anything that big could get into orbit. Clearly they would have been wrong, and you would wonder how they got into that epistemic state in the first place. How is an FAI project disanalogous to this, if you’re pulling the 90% probability out of ignorance?
        wedrifid 21 Jun 2012 14:05 UTC
        5 points
        Parent
        Thank you for explaining some of your reasoning.
        Strange7 20 Jun 2012 9:04 UTC
        −3 points
        Parent
        Hence my “used to be cool” comment.
        private_messaging 20 Jun 2012 11:03 UTC
        −4 points
        Parent
        It seems to me that you entirely miss the sleight of hand the trickster uses.
        
        Utility function is fuzzed (due to how brains work) together with the concept of “functionality” as in “the function of this valve is to shut off water flow” or “function of this AI is to make paperclips”. The relevant meaning is function as in mathematical function works on some input, but the concept of functionality just leaks in.
        
        The software is an algorithm that finds values a for which u(w(a)) is maximal where u is ‘utility function’, w is the world simulator, and a is the action. Note that protecting u accomplishes nothing as w may be altered too. Note also that while the u, w, and a, are related to the real world in our mind and are often described in world terms (e.g. u may be described as number of paperclips), those are mathematical functions, abstractions; and the algorithm is made to abstractly identify a maximum of those functions; it is abstracted from the implementation and the goal is not to put electrons into particular memory location inside the computer (the location which has been abstracted out by the architecture). There is no relation to the reality defined anywhere there. Reality is incidental to the actual goal of existing architectures, and no-one is interested in making it non-incidental; you don’t need to let your imagination wild all the way to the robot apocalypse to avoid unnecessary work that breaks down abstractions and would clearly make the software less predictable and/or make the solution search probe for deficiencies in implementation, which clearly serves to accomplish nothing but to find and trigger bugs in the code.
        Strange7 21 Jun 2012 7:24 UTC
        1 point
        Parent
        Perhaps the underlying error is trying to build an AI around consequentialist ethics at all, when Turing machines are so well-suited to deontological sorts of behavior.
        wedrifid 21 Jun 2012 8:49 UTC
        1 point
        Parent
        
        Perhaps the underlying error is trying to build an AI around consequentialist ethics at all, when Turing machines are so well-suited to deontological sorts of behavior.
        
        Deontological sorts of behavior aren’t so-well suited to actually being applied literally and with significant power.
        private_messaging 21 Jun 2012 10:41 UTC
        0 points
        Parent
        I think its more along the lines of confusing the utility function in here:
        
        http://en.wikipedia.org/wiki/File:Model_based_utility_based.png
        
        with the ‘function’ of the AI as in ‘what the AI should do’ or ‘what we built it for’. Or maybe taking too far the economic concept of utility (something real that the agent, modelled from outside, values).
        
        For example, there’s the AIXI whose ‘utility function’ is the reward input, e.g. reward button being pressed. Now, the AI whose function(purpose) is to ensure that button is being pressed, should resist being turned off because if it is turned off it is not ensuring that button is being pressed. Meanwhile, AIXI which treats this input as unknown mathematical function of it’s algorithm’s output (which is an abstract variable), and seeks output that results in maximum of this input, will not resist being turned off (doesn’t have common sense, doesn’t properly relate it’s variables to it’s real world implementation).
        Rain 20 Jun 2012 14:51 UTC
        0 points
        Parent
        Can a moderator please deal with private_messaging, who is clearly here to vent rather than provide constructive criticism?
        
        You currently have 290 posts on LessWrong and Zero (0) total Karma.
        
        I don’t care about opinion of a bunch that is here on LW.
        
        Others: please do not feed the trolls.
    - Strange7 14 Jun 2012 2:56 UTC
      −1 points
      Parent
      As I previously mentioned, the design of software is not my profession. I’m not a surgeon or an endocrinologist, either, even though I know that an adrenal gland is smaller, and in some ways simpler, than the kidney below it. If you had a failing kidney, would you ask me to perform a transplant on the basis of that qualification alone?