HoldenKarnofsky comments on Reply to Holden on ‘Tool AI’

HoldenKarnofsky 18 Jul 2012 2:35 UTC
21 points
Thanks for the response. To clarify, I’m not trying to point to the AIXI framework as a promising path; I’m trying to take advantage of the unusually high degree of formalization here in order to gain clarity on the feasibility and potential danger points of the “tool AI” approach.

It sounds to me like your two major issues with the framework I presented are (to summarize):

(1) There is a sense in which AIXI predictions must be reducible to predictions about the limited set of inputs it can “observe directly” (what you call its “sense data”).

(2) Computers model the world in ways that can be unrecognizable to humans; it may be difficult to create interfaces that allow humans to understand the implicit assumptions and predictions in their models.

I don’t claim that these problems are trivial to deal with. And stated as you state them, they sound abstractly very difficult to deal with. However, it seems true—and worth noting—that “normal” software development has repeatedly dealt with them successfully. For example: Google Maps works with a limited set of inputs; Google Maps does not “think” like I do and I would not be able to look at a dump of its calculations and have any real sense for what it is doing; yet Google Maps does make intelligent predictions about the external universe (e.g., “following direction set X will get you from point A to point B in reasonable time”), and it also provides an interface (the “route map”) that helps me understand its predictions and the implicit reasoning (e.g. “how, why, and with what other consequences direction set X will get me from point A to point B”).

Difficult though it may be to overcome these challenges, my impression is that software developers have consistently—and successfully—chosen to take them on, building algorithms that can be “understood” via interfaces and iterated over—rather than trying to prove the safety and usefulness of their algorithms with pure theory before ever running them. Not only does the former method seem “safer” (in the sense that it is less likely to lead to putting software in production before its safety and usefulness has been established) but it seems a faster path to development as well.

It seems that you see a fundamental disconnect between how software development has traditionally worked and how it will have to work in order to result in AGI. But I don’t understand your view of this disconnect well enough to see why it would lead to a discontinuation of the phenomenon I describe above. In short, traditional software development seems to have an easier (and faster and safer) time overcoming the challenges of the “tool” framework than overcoming the challenges of up-front theoretical proofs of safety/usefulness; why should we expect this to reverse in the case of AGI?
- Eliezer Yudkowsky 18 Jul 2012 14:12 UTC
  43 points
  Parent
  So first a quick note: I wasn’t trying to say that the difficulties of AIXI are universal and everything goes analogously to AIXI, I was just stating why AIXI couldn’t represent the suggestion you were trying to make. The general lesson to be learned is not that everything else works like AIXI, but that you need to look a lot harder at an equation before thinking that it does what you want.
  
  On a procedural level, I worry a bit that the discussion is trying to proceed by analogy to Google Maps. Let it first be noted that Google Maps simply is not playing in the same league as, say, the human brain, in terms of complexity; and that if we were to look at the winning “algorithm” of the million-dollar Netflix Prize competition, which was in fact a blend of 107 different algorithms, you would have a considerably harder time figuring out why it claimed anything it claimed.
  
  But to return to the meta-point, I worry about conversations that go into “But X is like Y, which does Z, so X should do reinterpreted-Z”. Usually, in my experience, that goes into what I call “reference class tennis” or “I’m taking my reference class and going home”. The trouble is that there’s an unlimited number of possible analogies and reference classes, and everyone has a different one. I was just browsing old LW posts today (to find a URL of a quick summary of why group-selection arguments don’t work in mammals) and ran across a quotation from Perry Metzger to the effect that so long as the laws of physics apply, there will always be evolution, hence nature red in tooth and claw will continue into the future—to him, the obvious analogy for the advent of AI was “nature red in tooth and claw”, and people who see things this way tend to want to cling to that analogy even if you delve into some basic evolutionary biology with math to show how much it isn’t like intelligent design. For Robin Hanson, the one true analogy is to the industrial revolution and farming revolutions, meaning that there will be lots of AIs in a highly competitive economic situation with standards of living tending toward the bare minimum, and this is so absolutely inevitable and consonant with The Way Things Should Be as to not be worth fighting at all. That’s his one true analogy and I’ve never been able to persuade him otherwise. For Kurzweil, the fact that many different things proceed at a Moore’s Law rate to the benefit of humanity means that all these things are destined to continue and converge into the future, also to the benefit of humanity. For him, “things that go by Moore’s Law” is his favorite reference class.
  
  I can have a back-and-forth conversation with Nick Bostrom, who looks much more favorably on Oracle AI in general than I do, because we’re not playing reference class tennis with “But surely that will be just like all the previous X-in-my-favorite-reference-class”, nor saying, “But surely this is the inevitable trend of technology”; instead we lay out particular, “Suppose we do this?” and try to discuss how it will work, not with any added language about how surely anyone will do it that way, or how it’s got to be like Z because all previous Y were like Z, etcetera.
  
  My own FAI development plans call for trying to maintain programmer-understandability of some parts of the AI during development. I expect this to be a huge headache, possibly 30% of total headache, possibly the critical point on which my plans fail, because it doesn’t happen naturally. Go look at the source code of the human brain and try to figure out what a gene does. Go ask the Netflix Prize winner for a movie recommendation and try to figure out “why” it thinks you’ll like watching it. Go train a neural network and then ask why it classified something as positive or negative. Try to keep track of all the memory allocations inside your operating system—that part is humanly understandable, but it flies past so fast you can only monitor a tiny fraction of what goes on, and if you want to look at just the most “significant” parts, you would need an automated algorithm to tell you what’s significant. Most AI algorithms are not humanly understandable. Part of Bayesianism’s appeal in AI is that Bayesian programs tend to be more understandable than non-Bayesian AI algorithms. I have hopeful plans to try and constrain early FAI content to humanly comprehensible ontologies, prefer algorithms with humanly comprehensible reasons-for-outputs, carefully weigh up which parts of the AI can safely be less comprehensible, monitor significant events, slow down the AI so that this monitoring can occur, and so on. That’s all Friendly AI stuff, and I’m talking about it because I’m an FAI guy. I don’t think I’ve ever heard any other AGI project express such plans; and in mainstream AI, human-comprehensibility is considered a nice feature, but rarely a necessary one.
  
  It should finally be noted that AI famously does not result from generalizing normal software development. If you start with a map-route program and then try to program it to plan more and more things until it becomes an AI… you’re doomed, and all the experienced people know you’re doomed. I think there’s an entry or two in the old Jargon File aka Hacker’s Dictionary to this effect. There’s a qualitative jump to writing a different sort of software—from normal programming where you create a program conjugate to the problem you’re trying to solve, to AI where you try to solve cognitive-science problems so the AI can solve the object-level problem. I’ve personally met a programmer or two who’ve generalized their code in interesting ways, and who feel like they ought to be able to generalize it even further until it becomes intelligent. This is a famous illusion among aspiring young brilliant hackers who haven’t studied AI. Machine learning is a separate discipline and involves algorithms and problems that look quite different from “normal” programming.
  What links here?
  - MatthewBaker's comment on Revisiting SI’s 2011 strategic plan: How are we doing? by lukeprog (18 Jul 2012 16:44 UTC; 0 points)
  - HoldenKarnofsky 18 Jul 2012 16:29 UTC
    18 points
    Parent
    Thanks for the response. My thoughts at this point are that
    
    We seem to have differing views of how to best do what you call “reference class tennis” and how useful it can be. I’ll probably be writing about my views more in the future.
    I find it plausible that AGI will have to follow a substantially different approach from “normal” software. But I’m not clear on the specifics of what SI believes those differences will be and why they point to the “proving safety/usefulness before running” approach over the “tool” approach.
    We seem to have differing views of how frequently today’s software can be made comprehensible via interfaces. For example, my intuition is that the people who worked on the Netflix Prize algorithm had good interfaces for understanding “why” it recommends what it does, and used these to refine it. I may further investigate this matter (casually, not as a high priority); on SI’s end, it might be helpful (from my perspective) to provide detailed examples of existing algorithms for which the “tool” approach to development didn’t work and something closer to “proving safety/usefulness up front” was necessary.
    What links here?
    wedrifid's comment on Reply to Holden on ‘Tool AI’ by Eliezer Yudkowsky (20 Jul 2012 0:15 UTC; 9 points)
    - oooo 6 Jul 2013 17:50 UTC
      7 points
      Parent
      Canonical software development examples emphasizing “proving safety/usefulness before running” over the “tool” software development approach are cryptographic libraries and NASA space shuttle navigation.
      
      At the time of writing this comment, there was recent furor over software called CryptoCat that didn’t provide enough warnings that it was not properly vetted by cryptographers and thus should have been assumed to be inherently insecure. Conventional wisdom and repeated warnings from the security community state that cryptography is extremely difficult to do properly and attempting to create your own may result in catastrophic results. A similar thought and development process goes into space shuttle code.
      
      It seems that the FAI approach to “proving safety/usefulness” is more similar to the way cryptographic algorithms are developed than the (seemingly) much faster “tool” approach, which is more akin to web development where the stakes aren’t quite as high.
      
      EDIT: I believe the “prove” approach still allows one to run snippets of code in isolation, but tends to shy away from running everything end-to-end until significant effort has gone into individual component testing.
      - Nebu 17 Feb 2016 9:35 UTC
        2 points
        Parent
        The analogy with cryptography is an interesting one, because...
        
        In cryptography, even after you’ve proven that a given encryption scheme is secure, and that proof has been centuply (100 times) checked by different researchers at different institutions, it might still end up being insecure, for many reasons.
        
        Examples of reasons include:
        
        The proof assumed mathematical integers/reals, of which computer integers/floating point numbers are just an approximation.
        The proof assumed that the hardware the algorithm would be running on was reliable (e.g. a reliable source of randomness).
        The proof assumed operations were mathematical abstractions and thus exist out of time, and thus neglected side channel attacks which measures how long a physical real world CPU took to execute a the algorithm in order to make inferences as to what the algorithm did (and thus recover the private keys).
        The proof assumed the machine executing the algorithm was idealized in various ways, when in fact a CPU emits heat other electromagnetic waves, which can be detected and from which inferences can be drawn, etc.
  - wedrifid 19 Jul 2012 2:05 UTC
    0 points
    Parent
    
    I can have a back-and-forth conversation with Nick Bostrom, who looks much more favorably on Oracle AI in general than I do, because we’re not playing reference class tennis with “But surely that will be just like all the previous X-in-my-favorite-reference-class”, nor saying, “But surely this is the inevitable trend of technology”; instead we lay out particular, “Suppose we do this?” and try to discuss how it will work, not with any added language about how surely anyone will do it that way, or how it’s got to be like Z because all previous Y were like Z, etcetera.
    
    That’s one way to “win” a game of reference class tennis. Declare unilaterally that what you are discussing falls into the reference class “things that are most effectively reasoned about by discussing low level details and abandoning or ignoring all observed evidence about how things with various kinds of similarity have worked in the past”. Sure, it may lead to terrible predictions sometimes but by golly, it means you can score an ‘ace’ in the reference class tennis while pretending you are not even playing!
    - Eliezer Yudkowsky 19 Jul 2012 17:52 UTC
      12 points
      Parent
      And atheism is a religion, and bald is a hair color.
      
      The three distinguishing characteristics of “reference class tennis” are (1) that there are many possible reference classes you could pick and everyone engaging in the tennis game has their own favorite which is different from everyone else’s; (2) that the actual thing is obviously more dissimilar to all the cited previous elements of the so-called reference class than all those elements are similar to each other (if they even form a natural category at all rather than having being picked out retrospectively based on similarity of outcome to the preferred conclusion); and (3) that the citer of the reference class says it with a cognitive-traffic-signal quality which attempts to shut down any attempt to counterargue the analogy because “it always happens like that” or because we have so many alleged “examples” of the “same outcome” occurring (for Hansonian rationalists this is accompanied by a claim that what you are doing is the “outside view” (see point 2 and 1 for why it’s not) and that it would be bad rationality to think about the “individual details”).
      
      I have also termed this Argument by Greek Analogy after Socrates’s attempt to argue that, since the Sun appears the next day after setting, souls must be immortal.
      - [deleted] 19 Jul 2012 22:20 UTC
        23 points
        Parent
        
        I have also termed this Argument by Greek Analogy after Socrates’s attempt to argue that, since the Sun appears the next day after setting, souls must be immortal.
        
        For the curious, this is from the Phaedo pages 70-72. The run of the argument are basically thus:
        
        P1 Natural changes are changes from and to opposites, like hot from relatively cold, etc.
        
        P2 Since every change is between opposites A and B, there are two logically possible processes of change, namely A to B and B to A.
        
        P3 If only one of the two processes were physically possible, then we should expect to see only one of the two opposites in nature, since the other will have passed away irretrievably.
        
        P4 Life and death are opposites.
        
        P5 We have experience of the process of death.
        
        P6 We have experience of things which are alive
        
        C From P3, 4, 5, and 6 there is a physically possible, and actual, process of going from death to life.
        
        The argument doesn’t itself prove (haha) the immortality of the soul, only that living things come from dead things. The argument is made in support of the claim, made prior to this argument, that if living people come from dead people, then dead people must exist somewhere. The argument is particularly interesting for premises 1 and 2, which are hard to deny, and 3, which seems fallacious but for non-obvious reasons.
        Eliezer Yudkowsky 20 Jul 2012 16:40 UTC
        14 points
        Parent
        This sounds like it might be a bit of a reverent-Western-scholar steelman such as might be taught in modern philosophy classes; Plato’s original argument for the immortality of the soul sounded more like this, which is why I use it as an early exemplar of reference class tennis:
        
        -
        
        Then let us consider the whole question, not in relation to man only, but in relation to animals generally, and to plants, and to everything of which there is generation, and the proof will be easier. Are not all things which have opposites generated out of their opposites? I mean such things as good and evil, just and unjust—and there are innumerable other opposites which are generated out of opposites. And I want to show that in all opposites there is of necessity a similar alternation; I mean to say, for example, that anything which becomes greater must become greater after being less.
        
        True.
        
        And that which becomes less must have been once greater and then have become less.
        
        Yes.
        
        And the weaker is generated from the stronger, and the swifter from the slower.
        
        Very true.
        
        And the worse is from the better, and the more just is from the more unjust.
        
        Of course.
        
        And is this true of all opposites? and are we convinced that all of them are generated out of opposites?
        
        Yes.
        
        And in this universal opposition of all things, are there not also two intermediate processes which are ever going on, from one to the other opposite, and back again; where there is a greater and a less there is also an intermediate process of increase and diminution, and that which grows is said to wax, and that which decays to wane?
        
        Yes, he said.
        
        And there are many other processes, such as division and composition, cooling and heating, which equally involve a passage into and out of one another. And this necessarily holds of all opposites, even though not always expressed in words—they are really generated out of one another, and there is a passing or process from one to the other of them?
        
        Very true, he replied.
        
        Well, and is there not an opposite of life, as sleep is the opposite of waking?
        
        True, he said.
        
        And what is it?
        
        Death, he answered.
        
        And these, if they are opposites, are generated the one from the other, and have there their two intermediate processes also?
        
        Of course.
        
        Now, said Socrates, I will analyze one of the two pairs of opposites which I have mentioned to you, and also its intermediate processes, and you shall analyze the other to me. One of them I term sleep, the other waking. The state of sleep is opposed to the state of waking, and out of sleeping waking is generated, and out of waking, sleeping; and the process of generation is in the one case falling asleep, and in the other waking up. Do you agree?
        
        I entirely agree.
        
        Then, suppose that you analyze life and death to me in the same manner. Is not death opposed to life?
        
        Yes.
        
        And they are generated one from the other?
        
        Yes.
        
        What is generated from the living?
        
        The dead.
        
        And what from the dead?
        
        I can only say in answer—the living.
        
        Then the living, whether things or persons, Cebes, are generated from the dead?
        
        That is clear, he replied.
        
        Then the inference is that our souls exist in the world below?
        
        That is true.
        
        (etc.)
        [deleted] 20 Jul 2012 19:36 UTC
        3 points
        Parent
        
        This sounds like it might be a bit of a reverent-Western-scholar steelman such as might be taught in modern philosophy classes
        
        That was roughly my aim, but I don’t think I inserted any premises that weren’t there. Did you have a complaint about the accuracy of my paraphrase? The really implausible premise there, namely that death is the opposite of life, is preserved I think.
        
        As for reverence, why not? He was, after all, the very first person in our historical record to suggest that thinking better might make you happier. He was also an intellectualist about morality, at least sometimes a hedonic utilitarian, and held no great respect for logic. And he was a skilled myth-maker. He sounds like a man after your own heart, actually.
        thomblake 25 Jul 2012 20:14 UTC
        2 points
        Parent
        I think your summary didn’t leave anything out, or even apply anything particularly charitable.
        thomblake 25 Jul 2012 20:18 UTC
        0 points
        Parent
        Esar’s summary doesn’t seem to be different from this, other than 1) adding the useful bit about “passed away irretrievably” and 2) yours makes it clear that the logical jump happens right at the end.
        
        I’m actually not sure now why you consider this like “reference class tennis”. The argument looks fine, except for the part where “souls exist in the world below” jumps in as a conclusion, not having been mentioned earlier in the argument.
        [deleted] 25 Jul 2012 20:50 UTC
        0 points
        Parent
        The ‘souls exist in the world below’ bit is directly before what Eliezer quoted:
        
        Suppose we consider the question whether the souls of men after death are or are not in the world below. There comes into my mind an ancient doctrine which affirms that they go from hence into the other world, and returning hither, are born again from the dead. Now if it be true that the living come from the dead, then our souls must exist in the other world, for if not, how could they have been born again? And this would be conclusive, if there were any real evidence that the living are only born from the dead; but if this is not so, then other arguments will have to be adduced.
        
        Very true, replied Cebes.
        
        Then let us consider the whole question...
        
        But you’re right that nothing in the argument defends the idea of a world below, just that souls must exist in some way between bodies.
        TheAncientGeek 4 Jul 2014 12:14 UTC
        0 points
        Parent
        The argument omits that living things can come from living things and dead thingsfrom dead things
        
        Therefore, the fact that living things can come from dead things does not mean that have to in every case.
        
        Although, if everything started off dead, they must have at some point.
        
        So it’s an argument for abiogenesis,
        bogdanb 10 Jul 2013 18:28 UTC
        0 points
        Parent
        
        just that souls must exist in some way between bodies.
        
        Not even that, at least in the part of the argument I’ve seen (paraphrased?) above.
        
        He just mentions an ancient doctrine, and then claims that souls must exist somewhere while they’re not embodied, because he can’t imagine where they would come from otherwise. I’m not even sure if the ancient doctrine is meant as argument from authority or is just some sort of Chewbacca defense.
        
        (He doesn’t seem to explicitly claim the “ancient doctrine” to be true or plausible, just that it came to his mind. It feels like I’ve lost something in the translation.)
      - wedrifid 20 Jul 2012 0:15 UTC
        9 points
        Parent
        
        (2) that the actual thing is obviously more dissimilar to all the cited previous elements of the so-called reference class than all those elements are similar to each other (if they even form a natural category at all rather than having being picked out retrospectively based on similarity of outcome to the preferred conclusion);
        
        Ok, it seems like under this definition of “reference class tennis” (particularly parts (2) and (3)) the participants must be wrong and behaving irrationality about it in order to be playing reference class tennis. So when they are either right or at least applying “outside view” considerations correctly, given all the information available to them they aren’t actually playing “reference class tennis” but instead doing whatever it is that reasoning (boundedly) correctly using reference to actual relevant evidence about related occurrences is called when it isn’t packaged with irrational wrongness.
        
        With this definition in mind it is necessary to translate replies such as those here by Holden:
        
        We seem to have differing views of how to best do what you call “reference class tennis” and how useful it can be. I’ll probably be writing about my views more in the future.
        
        Holden’s meaning is, of course, not that that he argues is actually a good thing but rather declaring that the label doesn’t apply to what he is doing. He is instead doing that other thing that is actually sound thinking and thinks people are correct to do so.
        
        Come to think of it if most people in Holden’s shoes heard Eliezer accuse them of “reference class tennis” and actually knew that he intended it with the meaning he explicitly defines here rather than the one they infer from context they would probably just consider him arrogant, rude and mind killed then write him and his organisation off as not worth engaging with.
        
        In the vast majority of cases where I have previously seen Eliezer argue against people using “outside view” I have agreed with Eliezer, and have grown rather fond of using the phrase “reference class tennis” as a reply myself where appropriate. But seeing how far Eliezer has taken the anti-outside-view position here and the extent to which “reference class tennis” is defined as purely an anti-outside-view semantic stop sign I’ll be far more hesitant to make us of it myself.
        
        It is tempting to observe “Eliezer is almost always right when he argues against ‘outside view’ applications, and the other people are all confused. He is currently arguing against ‘outside view’ applications. Therefore, the other people are probably confused.” To that I reply either “Reference class tennis!” or “F*$% you, I’m right and you’re wrong!” (I’m honestly not sure which is the least offensive.)
        Eliezer Yudkowsky 20 Jul 2012 0:43 UTC
        8 points
        Parent
        Which of 1, 2 and 3 do you disagree with in this case?
        
        Edit: I mean, I’m sorry to parody but I don’t really want to carefully rehash the entire thing, so, from my perspective, Holden just said, “But surely strong AI will fall into the reference class of technology used to give users advice, just like Google Maps doesn’t drive your car; this is where all technology tends to go, so I’m really skeptical about discussing any other possibility.” Only Holden has argued to SI that strong AI falls into this particular reference class so far as I can recall, with many other people having their own favored reference classes e.g. Hanson et. al as cited above; a strong AI is far more internally dissimilar from Google Maps and Yelp than Google Maps and Yelp are internally similar to each other, plus there are many many other software programs that don’t provide advice at all so arguably the whole class may be chosen-post-facto; and I’d have to look up Holden’s exact words and replies to e.g. Jaan Tallinn to decide to what degree, if any, he used the analogy to foreclose other possibilities conversationally without further debate, but I do think it happened a little, but less so and less explicitly than in my Robin Hanson debate. If you don’t think I should at this point diverge into explaining the concept of “reference class tennis”, how should the conversation proceed further?
        
        Also, further opinions desired on whether I was being rude, whether logically rude or otherwise.
        Randaly 25 Jul 2012 4:39 UTC
        9 points
        Parent
        Viewed charitably, you were not being rude, although you did veer away from your main point in ways likely to be unproductive. (For example, being unnecessarily dismissive towards Hanson, who you’d previously stated had given arguments roughly as good as Holden’s; or spending so much of your final paragraph emphasizing Holden’s lack of knowledge regarding AI.)
        
        On the most likely viewing, it looks like you thought Holden was probably playing reference class tennis. This would have been rude, because it would imply that you thought the following inaccurate things about him:
        
        He was “taking his reference class and going home”
        That you can’t “have a back-and-forth conversation” with him
        
        I don’t think that you intended those implications. All the same, your final comment came across as noticeably less well-written than your post.
        Eliezer Yudkowsky 25 Jul 2012 17:53 UTC
        2 points
        Parent
        Thanks for the third-party opinion!
        TimS 20 Jul 2012 0:47 UTC
        2 points
        Parent
        I’m confused how you thought “reference class tennis” was anything but a slur on the other side’s argument. Likewise “mindkilled.” Sometimes, slurs about arguments are justified (agnostic in the instant case) - but that’s a separate issue.
      - [deleted] 19 Jul 2012 22:40 UTC
        1 point
        Parent
        
        The three distinguishing characteristics of “reference class tennis” are
        
        Do Karnofsky’s contributions have even one of these characteristics, let alone all of them?
        Eliezer Yudkowsky 20 Jul 2012 0:09 UTC
        3 points
        Parent
        Empirically obviously 1 is true, I would argue strongly for 2 but it’s a legitimate point of dispute, and I would say that there were relatively small but still noticeable but quite forgiveable traces of 3.
      - aaronsw 4 Aug 2012 10:37 UTC
        −1 points
        Parent
        Then it does seem like your AI arguments are playing reference class tennis with a reference class of “conscious beings”. For me, the force of the Tool AI argument is that there’s no reason to assume that AGI is going to behave like a sci-fi character. For example, if something like On Intelligence turns out to be true, I think the algorithms it describes will be quite generally intelligent but hardly capable of rampaging through the countryside. It would be much more like Holden’s Tool AI: you’d feed it data, it’d make predictions, you could choose to use the predictions.
        
        (This is, naturally, the view of that school of AI implementers. Scott Brown: “People often seem to conflate having intelligence with having volition. Intelligence without volition is just information.”)
  - MatthewBaker 18 Jul 2012 16:27 UTC
    −2 points
    Parent
    Your prospective AI plans for programmer-understandability seems very close to Starmap-AI by which I mean
    
    It’s called the Global Association Table. The points or stars represent concepts, and the lines are the links between them.
    
    The best story I’ve read about a not so failed utopia involves this kind of accountability over the FAI. While I hate to generalize from fictional evidence it definitely seems like a necessary step to not becoming a galaxy that tiles over the aliens with happy faces instead of just freezing them in place to prevent human harm.
- JGWeissman 18 Jul 2012 17:42 UTC
  8 points
  Parent
  
  For example: Google Maps works with a limited set of inputs; Google Maps does not “think” like I do and I would not be able to look at a dump of its calculations and have any real sense for what it is doing; yet Google Maps does make intelligent predictions about the external universe (e.g., “following direction set X will get you from point A to point B in reasonable time”), and it also provides an interface (the “route map”) that helps me understand its predictions and the implicit reasoning (e.g. “how, why, and with what other consequences direction set X will get me from point A to point B”).
  
  Explaining routes is domain specific and quite simple. When you are using domain specific techniques to find solutions to domain specific problems, you can use domain specific interfaces where human programmers and designers do all the heavy lifting to figure out the general strategy of how to communicate to the user.
  
  But if you want a tool AGI that finds solutions in arbitrary domains, you need a cross domain solution for communicating tool AGI’s plans to the user. This is as much a harder problem than showing a route on a map, as cross domain AGI is a harder problem than computing the routes. Instead of the programmer figuring out how to plot road tracing curves on a map, the programmer has to figure out how to get the computer to figure out that displaying a map with route traced over it is a useful thing to do, in a way that generalizes figuring out other useful things to do to communicate answers to other types of questions. And among the hard subproblems of programming computers to find useful things to do in general problems is specifying the meaning of “useful”. If that is done poorly, the tool AGI tries to trick the user into accepting plans that achieve some value negating distortion of what we actually want, instead of giving information that helps provide a good evaluation. Doing this right requires solving the same problems required to do FAI right.
- private_messaging 18 Jul 2012 6:11 UTC
  −2 points
  Parent
  To note something on making AIXI based tool: Instead of calculating rewards sum over the whole future (something that is simultaneously impractical, computationally expensive, and would only serve to impair performance on task at hand), one could use the single-step reward, with 1 for button being pressed any time and 0 for button not being pressed ever. It is still not entirely a tool, but it has very bounded range of unintended behaviour (much harder to speculate of the terminator scenario). In the Hutter’s paper he outlines several not-quite-intelligences before arriving at AIXI.
  
  [edit2: also I do not believe that even with the large sum a really powerful AIXI-tl would be intelligently dangerous rather than simply clever at breaking the hardware that’s computing it. All the valid models in AIXI-tl that affect the choice of actions have to magically insert actions being probed into some kind of internal world model. The hardware that actually makes those actions, complete with sensory apparatus, is incidental; a useless power drain; a needless fire hazard endangering the precious reward pathway]
  
  With regards to utility functions, the utility functions in the AI sense are real valued functions taken over the world model, not functions like number of paperclips in the world. The latter function, unsafe or safe, would be incredibly difficult or impossible to define using conventional methods. It would suffice for accelerating the progress to have an algorithm that can take in an arbitrary function and find it’s maximum; while it would indeed seem to be “very difficult” to use that to cure cancer, it could be plugged into existing models and very quickly be used to e.g. design cellular machinery that would keep repairing the DNA alterations.
  
  Likewise, the speculative tool that can understand phrase ‘how to cure cancer’ and phrase ‘what is the curing time of epoxy’ would have to pick up most narrow least objectionable interpretation of the ‘cure cancer’ phrase to merely answer something more useful than ‘cancer is not a type of epoxy or glue, it does not cure’; it seems that not seeing killing everyone as a valid interpretation comes in as necessary consequence of ability to process language at all.
  - CarlShulman 22 Jul 2012 7:07 UTC
    0 points
    Parent
    
    All the valid models in AIXI-tl that affect the choice of actions have to magically insert actions being probed into some kind of internal world model. The hardware that actually makes those actions, complete with sensory apparatus, is incidental; a useless power drain; a needless fire hazard endangering the precious reward pathway
    
    If the past sensory data include information about the internal workings, then there will be a striking correlation between the outputs that the workings would produce on their own (for physical reasons) and the AI’s outputs. That rules out (or drives down expected utility of acting upon) all but very crazy hypotheses about how the Cartesian interaction works. Wrecking the hardware would break that correlation, and it’s not clear what the crazy hypotheses would say about that, e.g. hypotheses that some simply specified intelligence is stage-managing the inputs, or that sometimes the AIXI-tl’s outputs matter, and other times only the physical hardware matters.
    - private_messaging 22 Jul 2012 13:38 UTC
      −1 points
      Parent
      Well, you can’t include entire internal workings in the sensory data, and it can’t model significant portion of itself as it has to try big number of hypotheses on the model on each step, so I would not expect the very crazy hypotheses to be very elaborate and have high coverage of the internals.
      
      If I closed my eyes and did not catch a ball, the explanation is that I did not see it coming and could not catch it, but this sentence is rife with self references of the sort that is problematic for AIXI. The correlation between closed eyes and lack of reward can be coded into some sort of magical craziness, but if I close my eyes and not my ears and hear where the ball lands after I missed catching it, there’s the vastly simpler explanation for why I did not catch it—my hand was not in the right spot (and that works with total absence of sensorium as well). I don’t see how AIXI-tl (with very huge constants) can value it’s eyesight (it might have some value if there is some asymmetric in the long models, but it seems clear it would not assign the adequate, rational value to it’s eyesight). In my opinion there is no single unifying principle to intelligence (or none was ever found), and AIXI-tl (with very huge constants) fails way short of even a cat in many important ways.
      
      edit: Some other thought: I am not sure that Solomonoff induction’s prior is compatible with expected utility maximization. If the expected utility imbalance between crazy models grows faster than 2^length , and I would expect it to grow faster than any computable function (if the utility is unbounded), then the actions will be determined by imbalances between crazy, ultra long models. I would not privilege the belief that it just works without some sort of formal proof or some other very good reason to think it works.
      What links here?
      Kawoomba's comment on A cynical explanation for why rationalists worry about FAI by aaronsw (8 Aug 2012 10:02 UTC; 2 points)