Eliezer Yudkowsky comments on What Program Are You?

Eliezer Yudkowsky 12 Oct 2009 18:19 UTC
25 points
Methodological remark: One should write at some point on a very debilitating effect that I’ve noticed in decision theory, philosophy generally, and Artificial Intelligence, which one might call Complete Theory Bias. This is the academic version of Need for Closure, the desire to have a complete theory with all the loose ends sewn up for the sake of appearing finished and elegant. When you’re trying to eat Big Confusing Problems, like anything having to do with AI, then Complete Theory Bias torpedoes your ability to get work done by preventing you from navigating the space of partial solutions in which you can clearly say what you’re trying to solve or not solve at a given time.

This is very much on display in classical causal decision theory; if you look at Joyce’s Foundations of Causal Decision Theory, for example, it has the entire counterfactual distribution falling as mana from heaven. This is partially excusable because Pearl’s book on how to compute counterfactual distributions had only been published, and hence only really started to be popularized, one year earlier. But even so, the book (and any other causal decision theories that did the same thing) should have carried a big sign saying, “This counterfactual distribution, where all the interesting work of the theory gets carried out, falls on it as manna from heaven—though we do consider it obvious that a correct counterfactual for Newcomb ought to say that if-counterfactual you one-box, it has no effect on box B.” But this would actually get less credit in academia, if I understand the real rules of academia correctly. You do not earn humility points for acknowledging a problem unless it is a convention of the field to acknowledge that particular problem—otherwise you’re just being a bother, and upsetting the comfortable pretense that nothing is wrong.

Marcello and I have all sorts of tricks for avoiding this when we navigate the space of fragmentary solutions in our own work, such as calling things “magic” to make sure we remember we don’t understand them.

TDT is very much a partial solution, a solution-fragment rather than anything complete. After all, if you had the complete decision process, you could run it as an AI, and I’d be coding it up right now.

TDT does say that you ought to use Pearl’s formalism for computing counterfactuals, which is progress over classical causal decision theory; but it doesn’t say how you get the specific causal graph… since factoring the causal environment is a very open and very large AI problem.

Just like the entire problem of factoring the environment into a causal graph, there’s a whole entire problem of reasoning under logical uncertainty using limited computing power. Which is another huge unsolved open problem of AI. Human mathematicians had this whole elaborate way of believing that the Taniyama Conjecture implied Fermat’s Last Theorem at a time when they didn’t know whether the Taniyama Conjecture was true or false; and we seem to treat this sort of implication in a rather different way than “2=1 implies FLT”, even though the material implication is equally valid.

TDT assumes there’s a magic module bolted on that does reasoning over impossible possible worlds. TDT requires this magic module to behave in certain ways. For the most part, my methodology is to show that the magic module has to behave this way anyway in order to get commonsense logical reasoning done—i.e., TDT is nothing special, even though the whole business of reasoning over impossible possible worlds is an unsolved problem.

To answer Robin’s particular objection, what we want to do is drop out of TDT and show that an analogous class of reasoning problems apply to, say, pocket calculators. Let’s say I know the transistor diagram for a pocket calculator. I type in 3 + 3, not knowing the answer; and upon the screen flashes the LED structure for “6”. I can interpret this as meaning 3 + 3 = 6, or I can interpret it as a fact about the output of this sort of transistor diagram, or I can interpret it as saying that 3 + 3 is an even number, or that 2 3 is 6. And these may all tell me different things, at first, about the output of another, similar calculator. But all these different interpretations should generally give me compatible logical deductions about the other calculator and the rest of the universe. If I arrive at contradictory* implications by forming different abstractions about the calculator, then my magic logic module must not be sound.

The idea that you want to regard “all computations similar to yourself as having the same output” is just a gloss on the real structure. In the real version, there’s a single canonical mathematical fact of which you are presently logically uncertain, the output of the Godelian diagonal:

Argmax[A in Actions] in Sum[O in Outcomes](Utility(O)P(*this computation yields A []-> O|rest of universe))

The this computation above is not a reference to your entire brain. It is a reference to that one equation above, the canonical diagonal form. It’s assumed, in TDT, that you’re implementing that particular equation—that TDT is how you make your decisions.

Then you assume that particular equation has a particular output, and update your view of the rest of the physical universe accordingly. In “almost” the same way you would update your view of the universe when you saw the calculator output “6″. It might indeed depend on your logical reasoning engine. There might be things similar to yourself that you did not know were similar to yourself. If so, then you’ll (all) do worse, because your logical reasoning engine is weaker. But you should at least not arrive at a contradiction, if your logical reasoning engine is at least sound.

What if you can only approximate that equation instead of computing it directly, so that it’s possible that you and the equation will have different outputs? Should the equation be about your approximation of it, or should you just try to approximate the original equation? This is an open problem in TDT, which reflects the underlying open problem in AI; I just assumed there was enough computing power to do the above finite well-ordered computation directly. If you could show me a particular approximation, I might be able to answer better. Or someone could deliver a decisive argument for why any approximation ought to be treated a particular way, and that would make the problem less open in TDT, even though which approximation to use would still be open in AI.

(I also note at this point that the only way your counterfactual can apparently control the laws of physics, is if you know that the laws of physics imply that at least one answer is not compatible with physics, in which case you already know that option is not the output of the TDT computation, in which case you know it is not the best thing to do, in which case you are done considering it. So long as all answers seem not-visibly-incompatible with physics relative to your current state of logical knowledge, supposing a particular output should not tell you anything about physics.)

An example of a much more unsolved problem within TDT, which is harder to dispose of by appeal to normal non-TDT logical reasoning, is something that I only realized existed after reading Drescher; you actually can’t update on the subjunctive / counterfactual output of TDT in exactly the same way you can update on the actually observed output of a calculator. In particular, if you actually observed something isomorphic to your decision mechanism output action A2, you could infer that A2 had higher expected utility than A1, including any background facts about the world or one’s beliefs about it, that this would require; but if we only suppose that the mechanism is outputting A2, we don’t want to presume we’ve just calculated that A2 > A1, but we do want to suppose that other decision mechanisms will output A2.

The two ways that have occurred to me for resolving this situation would be to (1) stratify the deductions into the physical and the logical, so that we can deduce within the counterfactual that other physical mechanisms will output “A2”, but not deduce within our own logic internal to the decision process that A2 > A1. Or (2) to introduce something akin to a causal order within logical deductions, so that “A2 > A1“ is a parent of “output = A2” and we can perform counterfactual surgery on “output = A2” without affecting the parent node.
What links here?
- MichaelGR 12 Oct 2009 19:33 UTC
  9 points
  Parent
  Shouldn’t this be its own post?
- Psy-Kosh 12 Oct 2009 19:50 UTC
  3 points
  0
  Parent
  
  What if you can only approximate that equation instead of computing it directly, so that it’s possible that you and the equation will have different outputs? Should the equation be about your approximation of it, or should you just try to approximate the original equation?
  
  Incidentally, that’s essentially a version the issue I was trying to deal with here (and in the linked conversation between Silas and I)
  - SilasBarta 12 Oct 2009 20:00 UTC
    2 points
    0
    Parent
    Ooh! Good point! And for readers who follow through, be sure to note my causal graph and my explanation of how Eliezer_Yudkowsky has previously accounted for how to handle errors when you can’t compute exactly what your output will be due to the hardware’s interference [/shameless self-promotion]
    - Psy-Kosh 12 Oct 2009 20:25 UTC
      0 points
      Parent
      If you’re right, I’d be extra confused, because then Eliezer could account for the sort of error I was describing, in terms of ambiguity of what algorithm you’re actually running, but could not deal with the sort of errors due to one merely approximating the ideal algorithm, which I’d think to be somewhat of a subset of the class of issues I was describing.
      
      Well, either way, as long as the issue is brought to the front and solved (eventually) somehow, I’m happy. :)
      - SilasBarta 12 Oct 2009 20:40 UTC
        2 points
        0
        Parent
        The difference is that Newcomb’s problem allows you to assume that your (believed) choice of output is guaranteed to be your actual decision.
        
        Post-computation interference only occurs in real-life scenarios (or hypotheticals that assume this realistic constraint), and it is those scenarios where Eliezer_Yudkowsky shows that you should pick a different computation output, given its robustness against interference from your “corrupted hardware”.
- RobinHanson 12 Oct 2009 19:36 UTC
  3 points
  Parent
  So is the “input” to this computation the functions U and P? Is “that computation” all places in spacetime when this particular input was considered, or all uses of the TDT framework at all?
  - Eliezer Yudkowsky 12 Oct 2009 19:41 UTC
    2 points
    Parent
    “This computation” is exactly equal to the Godelian diagonal and anything you can deduce from making assumptions about it. If I assume the output of a calculator into which I punched “3 + 3″ is “6”, then the question is not “What computation do I believe this to be, exactly?” but just “What else can I logically infer from this given my belief about how various other logical facts are connected to this logical fact?” You could regard the calculator as being a dozen different calculations simultaneously, and if your inferences are sound they ought not to tangle up.
    
    With that said, yes, you could view the TDT formula as being parameterized around U, P, and the action set A relative to P. But it shouldn’t matter how you view it, any more than it matters how you view a calculator for purposes of making inferences about arithmetic and hence other calculators. The key inferences are not carried out through a reference class of computations which are all assumed to be correlated with each other and not anything else. The key inferences are carried out through more general reasoning about logical facts, such as one might use to decide that the Taniyama Conjecture implied Fermat’s Last Theorem. In other words, I can make inferences about other computations without seeing them as “the same computation” by virtue of general mathematical reasoning.
    
    “That computation” is just a pure abstract mathematical fact about the maximum of a certain formula.
    
    Counterexample request: can you give me a specific case where it matters which computation I view myself as, given that I’m allowed to make general mathematical inferences?
    - RobinHanson 12 Oct 2009 23:34 UTC
      15 points
      Parent
      I really have a lot of trouble figuring out what you are talking about. I thought I could take just one concept you referred to and discuss that, but apparently this one concept is in your mind deeply intertwined with all your other concepts, leaving me without much ground to stand on to figure out what you mean. I guess I’ll just have to wait until you write up your ideas in a way presentable to a wider audience.
      What links here?
      XiXiDu's comment on Friendly AI Research and Taskification by multifoliaterose (14 Dec 2010 12:04 UTC; 5 points)
    - Wei Dai 13 Oct 2009 4:11 UTC
      1 point
      Parent
      I agree that if we had a general theory of logical uncertainty, then we wouldn’t need to have an answer to Robin’s question.
      
      Counterexample request: can you give me a specific case where it matters which computation I view myself as, given that I’m allowed to make general mathematical inferences?
      
      I think the old True PD example works here. Should I view myself as controlling the computation of both players, or just player A, assuming the two players are not running completely identical computations (i.e. same program and data)? If I knew how I should infer the decision of my opponent given my decision, then I wouldn’t need to answer this question.
      - Eliezer Yudkowsky 13 Oct 2009 15:33 UTC
        1 point
        Parent
        What I would generally say at this point is, “What part of this is a special problem to TDT? Why wouldn’t you be faced with just the same problem if you were watching two other agents in the True PD, with some particular partial knowledges of their source code, and I told you that one of the agents’ computations had a particular output? You would still need to decide what to infer about the other. So it’s not TDT’s problem, it legitimately modularizes off into a magical logical inference module...”
        
        (Of course there are problems that are special to TDT, like logical move ordering, how not to infer “A1 has EU of 400, therefore if I output A2 it must have EU > 400”, etc. But “Which computation should I view myself as running?” is not a special problem; you could ask it about any calculator, and if the inference mechanism is sound, “You can use multiple valid abstractions at the same time” is a legitimate answer.)
- mormon2 13 Oct 2009 1:12 UTC
  2 points
  Parent
  “TDT is very much a partial solution, a solution-fragment rather than anything complete. After all, if you had the complete decision process, you could run it as an AI, and I’d be coding it up right now.”
  
  I must nitpick here:
  
  First you say TDT is an unfinished solution, but from all the stuff that you have posted there is no evidence that TDT is anything more than a vague idea; is this the case? If not could you post some math and example problems for TDT.
  
  Second, I hope this was said in haste not in complete seriousness that if TDT was complete you could run it as an AI and you’d be coding. So does this mean that you believe that TDT is all that is required for the theory end of AI? Or are you stating that the other hard problems such as learning; sensory input and recognition, and knowledge representation are all solved for your AI? If this be the case I would love to see a post on that.
  
  Thanks
- whpearson 14 Oct 2009 10:46 UTC
  1 point
  Parent
  Have you defined the type/interface of the magic modules? In haskell at least you can define a function as undefined with a type signature and check whether it compiles.