handoflixue comments on SotW: Be Specific

handoflixue 5 Apr 2012 21:12 UTC
2 points

the Genie will attempt to implement the wish in a way the results in a net decrease of utility for the wisher, but is bound by any constraints explicitly written into the wish

Constraint: This must result in a net increase in utility for me...
- Eliezer Yudkowsky 6 Apr 2012 0:25 UTC
  8 points
  Parent
  I rewire your preferences. Oh, that wasn’t what you meant by “utility”?
  - Ezekiel 6 Apr 2012 0:48 UTC
    12 points
    Parent
    Incidentally, I find it funny (although not necessarily significant) that everyone else’s instinct was to talk about the Genie in the third person, whereas Eliezer used the first person.
    
    (Double-posted because it’s a completely separate and much more frivolous comment.)
    - Shmi 8 Apr 2012 18:45 UTC
      3 points
      Parent
      It is extremely significant. That’s partly the reason why EY managed to play the AI-in-a-box game rather successfully despite the overwhelming odds.
      - A1987dM 8 Apr 2012 19:07 UTC
        0 points
        Parent
        Er.… How do you know? I thought he hadn’t disclosed anything about how he did that.
        Shmi 8 Apr 2012 20:24 UTC
        2 points
        Parent
        He mentioned on some mailing list that he had to think like an AI desperately trying to get out. It makes a world of difference in how you approach the situation if it is your life that is actually on the line.
        TheOtherDave 8 Apr 2012 20:04 UTC
        0 points
        Parent
        The role of “the Genie” here and “the AI” in the Boxed AI game have certain obvious similarities.
        It seems reasonable to assume that a willingness to adopt the former correlates with a willingness to adopt the latter.
        It seems reasonable to assume that a willingness to adopt the role of “the AI” in the Boxed AI game is necessary (though not sufficient) in order to win that game.
        So shminux’s claim seems fairly uncontroversial to me.
        Do you dispute it, or are you merely making a claim about the impossibility of knowledge?
        A1987dM 8 Apr 2012 20:18 UTC
        0 points
        Parent
        
        Do you dispute it, or are you merely making a claim about the impossibility of knowledge?
        
        The latter.
  - Ezekiel 6 Apr 2012 0:45 UTC
    3 points
    Parent
    You could try: “Constraint: The value that my current utility function would assign to the universe after this wish is implemented must be higher than the value my current utility function would assign to the universe that would have existed had you not implemented this wish.”
    
    … Which probably causes the Genie to, at best, throw an undefined-error, since human beings don’t have well-defined utility functions. Since it’s malicious, it will probably search through all of your desires, pick one of them at random to count as “my utility function”, and then reinterpret the body of the wish to maximise that one thing at the expense of all others.
    - wedrifid 6 Apr 2012 2:13 UTC
      5 points
      Parent
      
      Since it’s malicious, it will probably search through all of your desires, pick one of them at random to count as “my utility function”, and then reinterpret the body of the wish to maximise that one thing at the expense of all others.
      
      It’s malicious and omnipotent. It’ll do far worse than that. It’ll scan your preferences until it finds a contradiction. Once you have a contradiction you can derive absolutely anything. It would then proceed to calculate your Coherent Extrapolated Volition and minimise it. It may not be obliged to figure out what you actually want but it can certainly do so for the purpose of being spiteful!
      - Grognor 7 Apr 2012 19:17 UTC
        11 points
        Parent
        I think that is the first time I’ve ever seen anyone accurately describe the worst thing that could possibly happen.
      - TheOtherDave 6 Apr 2012 2:20 UTC
        4 points
        Parent
        Or alter my preferences so I antiprefer whatever it is able to produce the most of. Plus altering my brain such that my disutility and dishedonism are linear with that thing. Getting the attention of a Crapsack God sucks.
        wedrifid 6 Apr 2012 2:36 UTC
        2 points
        Parent
        
        Or alter my preferences so I antiprefer whatever it is able to produce the most of. Plus altering my brain such that my disutility and dishedonism are linear with that thing.
        
        Those are actually subsumed under “mimimize CEV”. In the same way that maximising our CEV will not involve modifying our preferences drastically (unless it turns out we are into that sort of thing after all), minimising CEV would, if that turns out to be the worst way to @#$@ with us.
        
        Getting the attention of a Crapsack God sucks.
        
        Can’t argue with that.
      - Dmytry 6 Apr 2012 13:04 UTC
        1 point
        Parent
        What happens if you ask it to maximize your CEV, though?
        
        Lemme remember, the idea with CEV was what you’d desired if you thought faster and more reliably. Okay I ponder what will happen to you if your mind was BusyBeaver(10) times faster (way scarier number than 3^^^^3), without your body working any faster. 1 second passes.
        wedrifid 7 Apr 2012 2:14 UTC
        3 points
        Parent
        
        What happens if you ask it to maximize your CEV, though?
        
        It’ll fuck with you. Because that is what it does. It has plenty of scope to do so because CEV is not fully defined as of now. I’m not sure precisely how it would go about doing so. I just assume it does in some way I haven’t thought of yet.
        
        The meaning it attributes to CEV when it wants to exploit it to make things terrible is very different to the meaning it attributes to CEV when we try to use it to force it to understand us. It’s almost as bad as some humans in that regard!
        Dmytry 7 Apr 2012 4:49 UTC
        −1 points
        Parent
        
        It has plenty of scope to do so because CEV is not fully defined as of now.
        
        The understatement of the year. CEV is vaguest crap ever with lowest hope of becoming less vague.
        wedrifid 7 Apr 2012 7:08 UTC
        1 point
        Parent
        
        with lowest hope of becoming less vague.
        
        That’s a rather significant claim.
        Dmytry 7 Apr 2012 7:16 UTC
        −3 points
        Parent
        It’s very uncommon to see crap this vague in development for such a long time by such a clever person, without it becoming less vague.
        wedrifid 7 Apr 2012 8:46 UTC
        1 point
        Parent
        As far as I am aware this crap isn’t in development. It isn’t the highest research priority so the other SingInst researchers haven’t been working on it much and Eliezer himself is mostly focused on writing a rationality book. Other things like decision theory are being worked on—which has involved replacing vague as crap TDT with less-vague UDT and UDT2.
        
        I would like to see more work published on CEV. The most recent I am familiar with is this.
        orthonormal 7 Apr 2012 15:16 UTC
        6 points
        Parent
        
        replacing vague as crap TDT with less-vague UDT and UDT2
        
        As I’ve figured out while writing the last few posts, TDT hasn’t been explained well, but it is a genuinely formalizable theory. (You’ll have to trust me until Part III or check the decision-theory mailing list.) But it’s a different theory from ADT and UDT, and the latter ones are preferable.
        Expand this thread
        A1987dM 8 Apr 2012 15:11 UTC
        0 points
        Parent
        You mean you have something in mind about how to handle counterfactuals over logically impossible worlds, or simply “I’m not sure it can’t be done”?
        orthonormal 8 Apr 2012 15:26 UTC
        0 points
        Parent
        I mean, I’ve written an algorithm (in the context of the tournament) which does what TDT should do (just as the algorithm in my last post does what CDT should do). The nice part about specifying the context so precisely is that I can dodge many of the hairy issues which come up in practice, and just show the essence of the decision theories.
        Dmytry 7 Apr 2012 9:04 UTC
        −8 points
        Parent
        Well, it seems hopeless to me, maybe it also seems hopeless to him.
        
        If you take the maxim that human values are complex, then you need mind uploads to run CEV, not just that, but heavily (and likely unethically) processed mind uploads that are being fed hypothetical situations (That’s a way to make it non vague). Clearly that’s not what you want. If you take the possibility that agreeable moral system could result from some rather simple principles, then in a very wide range of possible estimates of likehood vs complexity, you have to focus your effort first on the simple principles because they have better expected payoff per thought-time even if their probability may be lower.
        
        edit: Specific example. You could e.g. advocate a requirement whereby for all AIs it must be proved they can’t self improve their hardware (e.g. would either not touch their hardware or would wirehead), and work to develop a framework that ensures this. That may be a very low hanging fruit because various self modifying stuff seem to be hard to keep from wireheading. This results in less AI risk. Not seeing any effort of this kind from SIAI though (due to them being too focussed on telling how the AI won’t wirehead, will self improve at hardware level, and will kill us all)
        Expand this thread
        wedrifid 7 Apr 2012 9:23 UTC
        2 points
        Parent
        
        Well, it seems hopeless to me, maybe it also seems hopeless to him.
        
        Perceived hopelessness would not stop him from attempting it anyway. He would abandon any detail and fundamentally change his whole strategy if necessary but some way of specifying values for an FAI based off human values if you want to create an FAI. And that goal is something he has concluded is absolutely necessary.
        
        All this said it doesn’t seem like working out CEV (or an alternatively named solution to the same problem) is the most difficult task to solve when it comes to creating predictably safe and Friendly GAIs.
        
        If you take the maxim that human values are complex, then you need mind uploads to run CEV, not just that, but heavily (and likely unethically) processed mind uploads that are being fed hypothetical situations (That’s a way to make it non vague). Clearly that’s not what you want.
        
        Simulation is one way of evaluating how much a human may like an outcome but not only is it not necessary it is not even sufficient for the purpose of calculating CEV. Reasoning logically about the values of a creature can be done without running it and most likely needs to be done in order to do the ‘extrapolating’ and ‘cohering’ parts rather than just brute force evaluation of volition.
        Dmytry 7 Apr 2012 9:41 UTC
        0 points
        Parent
        This, however, requires the values not to be very complex—some form of moral absolutism is needed so that you can compress the values from one process into other process.
        
        but some way of specifying values for an FAI based off human values if you want to create an FAI. And that goal is something he has concluded is absolutely necessary.
        
        One shouldn’t narrow the search too much too early. It may be that there is a common base C that our morals result from, and that the agreeable FAI morals can be derived from. You can put in as terminal goals the instrumental game theoretic goals of one type of good guy agent in ecology for example, which you can derive with a narrow AI. Then you may get a very anthropomorphic looking FAI that has enough common-sense not to feed utility monsters, not to get pascal-mugged, etc. It won’t align with our values precisely, but it can be within the range that we call friendly.
        wedrifid 7 Apr 2012 9:59 UTC
        1 point
        Parent
        
        This, however, requires the values not to be very complex—some form of moral absolutism is needed so that you can compress the values from one process into other process.
        
        It requires that the values not be very complex to a superintelligence. “Complex” takes on a whole different meaning in that context. I mean, it is quite probable (given the redundancy and non-values based components in our brain) that our values don’t even take as many nodes to represent as the number of neurons in the brain. Get smart enough and that becomes child’s play!
        Dmytry 7 Apr 2012 10:16 UTC
        −2 points
        Parent
        
        It requires that the values not be very complex to a superintelligence. “Complex” takes on a whole different meaning in that context. I mean, it is quite probable (given the redundancy and non-values based components in our brain) that our values don’t even take as many nodes to represent as the number of neurons in the brain. Get smart enough and that becomes child’s play!
        
        Hmm. So it takes super-intelligence to understand the goal. So let me get this straight: this works by setting off a self improving AI that gets super intelligent and then becomes friendly? Or do we build a super-intelligence right off, super intelligent on it’s original computer?
        
        And you still need what ever implements values, to be ethical to utilize in the ways in which it is unethical to utilize a human mind.
        wedrifid 7 Apr 2012 10:28 UTC
        1 point
        Parent
        
        Hmm. So it takes super-intelligence to understand the goal.
        
        I didn’t say that. I did say that for “but it’s complex!” to be an absolute limiting factor it would be required to be complex even to an FAI.
        
        So let me get this straight: this works by setting off a self improving AI that gets super intelligent and then becomes friendly?
        
        It had better start Friendly, if it doesn’t you are just asking for trouble. Obviously it wouldn’t be as good at being Friendly yet when it isn’t particularly smart.
        
        Or do we build a super-intelligence right off, super intelligent on it’s original computer?
        
        That sounds implausible. We’d probably go extinct before we managed to pull that off.
        Dmytry 7 Apr 2012 10:49 UTC
        −2 points
        Parent
        
        It had better start Friendly, if it doesn’t you are just asking for trouble. Obviously it wouldn’t be as good at being Friendly yet when it isn’t particularly smart.
        
        Of course. Then we need to have some simple friendliness for when it is dumber. Let’s look at CEV. Can I figure out what is the extrapolated volition of mankind? That’s despite me having hardware-assist other-mind virtualization capability, aka “putting myself in other’s shoes”. Which non-mind-upload probably won’t have. Hell, I’m not sure enough the extrapolated volition isn’t a death wish.
        wedrifid 7 Apr 2012 11:15 UTC
        5 points
        Parent
        
        Of course. Then we need to have some simple friendliness for when it is dumber.
        
        A dumb (ie. About as smart as us but far more rational) AI would, I assume, think along the lines of:
        
        “So… I’m kinda dumb. How about I make myself smart before I fuck with stuff? So for now I’ll do a basic analysis of what my humans seem to want and make sure I don’t do anything drastic to damage that while I’m in my recursive improvement stage. For example I’m definitely not going to turn them all into computation. It don’t take a genius to figure out they probably don’t want that.”
        
        ie. It is possible to maximise the expected value of a function without being able to perfectly calculate the details of said function but just approximate them. This involves taking into account risks that you are missing something important and also finding a way to improve your ability to better calculate and search for maxima within the function without risking significant damage to the overall outcome. This doesn’t necessarily require special case programming although the FAI developers may find special case programming easier to make proofs about.
        
        Have you got a better idea than that? If so, then probably the FAI would do that instead of what I just came up with after 2 seconds thought.
        
        Hell, I’m not sure enough the extrapolated volition isn’t a death wish.
        
        I’m not sure either—specifically because the way different humans’s values are aggregated is distinctly underspecified in CEV as Eliezer has ever discussed. That is combined with going about implicitly (it’s this implicit part that I particularly don’t like) assuming that “all of humanity” is what CEV must be run on. I can’t know that CEV will not kill me. Even if it doesn’t kill me it is nearly tautologically true that CEV is better (in the subjectively objective sense of ‘better’).
        
        This is one of two significant objections to Eliezer-memes that I am known to harp on from time to time.
        fubarobfusco 7 Apr 2012 17:50 UTC
        1 point
        Parent
        
        That is combined with going about implicitly (it’s this implicit part that I particularly don’t like) assuming that “all of humanity” is what CEV must be run on. I can’t know that CEV will not kill me. Even if it doesn’t kill me it is nearly tautologically true that CEV is better (in the subjectively objective sense of ‘better’).
        
        Here’s the trouble, though: by the same reasoning, if someone is implementing CEV or CEV or CEV or any such, everyone who isn’t a white person, Russian intellectual, or Orthodox Gnostic Pagan has a damned good reason to be worried that it’ll kill them.
        
        Now, it may turn out that CEV is sufficiently similar to CEV that the rest of humanity needn’t worry. But is that a safe bet for all of us who aren’t Orthodox Gnostic Pagans?
        Incorrect 7 Apr 2012 18:35 UTC
        0 points
        Parent
        For anyone who implements an AI, any justification for including other members of humanity in their CEV calculation is valid iff their CEV would specify that anyway.
        
        So, the rational course of action for anyone implementing an AI is to simply use their own CEV. If that CEV specifies to consider the CEV of other members of humanity then so be it.
        wedrifid 7 Apr 2012 18:47 UTC
        0 points
        Parent
        
        For anyone who implements an AI, any justification for including other members of humanity in their CEV calculation is valid iff their CEV would specify that anyway.
        
        YES! CEV is altruism inclusive. For some reason this is often really hard to make people understand that the altruis belongs inside the CEV calculation while the compromise-for-instrumental-purposes goes on the outside.
        
        So, the rational course of action for anyone implementing an AI is to simply use their own CEV. If that CEV specifies to consider the CEV of other members of humanity then so be it.
        
        This is true all else being equal. (The ‘all else’ being specifically that you are just as likely to succeed in creating FAI> as you are in creating FAI>.)
        hairyfigment 7 Apr 2012 19:42 UTC
        0 points
        Parent
        
        For some reason this is often really hard to make people understand
        
        IAWYC, but who doesn’t get this?
        
        Given our attitude toward politics, I’d expect little if any gain from replacing ‘humanity’ with ‘Less Wrong’. Moreover, others would correctly take our exclusion of them as evidence of a meaningful difference if we actually made this decision. And I can’t write an AGI by myself, nor can the smarter version of me calling itself Eliezer.
        wedrifid 7 Apr 2012 22:24 UTC
        0 points
        Parent
        
        IAWYC, but who doesn’t get this?
        
        I don’t recall the names. The conversations would be archived though if you are interested.
        wedrifid 7 Apr 2012 18:11 UTC
        0 points
        Parent
        Compromise is often necessary for the purpose of cooperation and CEV is a potentially useful Schelling point to agree upon. However, it should be acknowledged that these considerations are instrumental—or at least acknowledged that they are decisions to be made. Eliezer’s discussion of the subject up until now has been completely innocent of even awareness of the possibility that ‘humanity’ is the only thing that could conceivably be plugged in to CEV. This is, as far as I am concerned, a bad thing.
        Incorrect 7 Apr 2012 18:39 UTC
        0 points
        Parent
        
        This is, as far as I am concerned, a but thing.
        
        Huh?
        wedrifid 7 Apr 2012 18:40 UTC
        0 points
        Parent
        bad thing. Fixed.
        Dmytry 7 Apr 2012 11:35 UTC
        −4 points
        Parent
        
        A dumb (ie. About as smart as us but far more rational) AI would, I assume, think along the lines of:
        
        Sounds like a lot of common sense that is very difficult to derive rationally.
        
        “So… I’m kinda dumb. How about I make myself smart before I fuck with stuff? So for now I’ll do a basic analysis of what my humans seem to want and make sure I don’t do anything drastic to damage that while I’m in my recursive improvement stage. For example I’m definitely not going to turn them all into computation. It don’t take a genius to figure out they probably don’t want that.”
        
        Have you got a better idea than that? If so, then probably the FAI would do that instead of what I just came up with after 2 seconds thought.
        
        Just a little more anthropomorphizing and we’ll be speaking of AI that just knows what is the moral thing to do, innately, because he’s such a good guy.
        
        The ‘basic analysis of what my humans seem to want’ has fairly creepy overtones to it (testing hypotheses style). On top of it, say, you tell it, okay just do what ever you think i would do if I thought faster, and it obliges, you are vaporized, because you would of gotten bored into suicide if you thought faster, your simple values system works like this. What’s exactly wrong about that course of action? I don’t think ‘extrapolating’ is well defined.
        
        re: volition of mankind. Yep.
        wedrifid 7 Apr 2012 12:27 UTC
        2 points
        Parent
        
        Sounds like a lot of common sense
        
        It doesn’t sound like particularly common sense—I’d guess that significantly less than half of humans would arrive at that as a cached ‘common sense’ solution.
        
        that is very difficult to derive rationally.
        
        It’s utterly trivial application of instrumental rationality. I can come up with it in 2 seconds. If the AI is as smart as I am (and with far less human biases) it can arrive at the solution as simply as I can. Especially after it reads every book on strategy that humans have written. Heck, it can read my comment and then decide whether it is a good strategy.
        
        Artificial intelligences aren’t stupid.
        
        Just a little more anthropomorphizing and we’ll be speaking of AI that just knows what is the moral thing to do, innately, because he’s such a good guy.
        
        Or… not. That’s utter nonsense. We have been explicitly describing AIs that have been programmed with terminal goals. The AI would then
        
        The ‘basic analysis of what my humans seem to want’ has fairly creepy overtones to it (testing hypotheses style). On top of it, say, you tell it, okay just do what ever you think i would do if I thought faster, and it obliges, you are vaporized, because you would of gotten bored into suicide if you thought faster, your simple values system works like this. What’s exactly wrong about that course of action? I don’t think ‘extrapolating’ is well defined.
        
        CEV is well enough defined that it just wouldn’t do that unless you actually do want it—in which case you, well, want it to do that so have no cause to complain. Reading even the incomplete specification from 2004 is sufficient to tell us that a GAI that does that is not implementing something that can reasonably called CEV. I must conclude that you are replying to a straw man (presumably due to not having actually read the materials you criticise.)
        Dmytry 7 Apr 2012 13:52 UTC
        −3 points
        Parent
        CEV is not defined to do what you as-is actually want, but to do what you would of wanted, even in circumstances when you as-is actually want something else, as the 2004 paper cheerfully explains.
        
        In any case, once you assume such intent-understanding interpretative powers of AI, it’s hard to demonstrate why instructing the AI in plain English to “Be a good guy. Don’t do bad things” would not be a better shot.
        wedrifid 7 Apr 2012 14:02 UTC
        3 points
        Parent
        
        In any case, once you assume such intent-understanding interpretative powers of AI
        
        Programmed in with great effort, thousands of hours of research and development and even then great chance of failure. That isn’t “assumption”.
        
        it’s hard to demonstrate why instructing the AI in plain English to “Be a good guy. Don’t do bad things” would not be a better shot.
        
        That would seem to be a failure of imagination. That exhortation tells even an FAI-complete AI that is designed to follow commands to do very little.
        Dmytry 7 Apr 2012 14:08 UTC
        −7 points
        Parent
        
        Programmed in with great effort, thousands of hours of research and development and even then great chance of failure. That isn’t “assumption”.
        
        Universe doesn’t grade ‘for effort’.
        
        That would seem to be a failure of imagination.
        
        That’s how the pro CEV argument seem to me.
        
        That exhortation tells even an FAI-complete AI that is designed to follow commands to do very little.
        
        When you are a very good engineer you can work around constraint more and more. For example, right now using the resources that the AI can conceivably command without technically innovating, improving situation with the hunger in Africa will involve drastic social change with some people getting shot. Some slightly superhuman technical innovation, and this can be done without hurting anyone. We humans are barely-able engineers and scientists; we got this technical civilization once we got just barely able to do that.
        wedrifid 7 Apr 2012 14:28 UTC
        7 points
        Parent
        
        Universe doesn’t grade ‘for effort’.
        
        And that is enough non-sequiturs for one conversation. My comment in no way implied that it does, nor did it rely on it for the point was making. It even went as far as to explicitly declare likely failure.
        
        You seem to be pattern matching from keywords to whatever retort you think counters them. This makes the flow of the conversation entirely incoherent and largely pointless.
        
        When you are a very good engineer you can work around constraint more and more. For example, right now using the resources that the AI can conceivably command without technically innovating, improving situation with the hunger in Africa will involve drastic social change with some people getting shot. Some slightly superhuman technical innovation, and this can be done without hurting anyone. We humans are barely-able engineers and scientists; we got this technical civilization once we got just barely able to do that.
        
        This is both true and entirely orthogonal to that which it seems intended to refute.