paulfchristiano comments on Tiling Agents for Self-Modifying AI (OPFAI #2)

paulfchristiano 7 Jun 2013 18:22 UTC
11 points
“What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble?”

The opportunity to kill yourself in exchange for $10 is a prototypical case. It’s well and good to say “this is only a problem for an agent who uses proofs,” but that’s not a scenario, that’s a type of agent. Yes, real agents will probably not use mathematical logic in the naive way. But as I pointed out in response to Wei Dai, probabilism doesn’t make the issues go away. It softens the impossibility proofs, but we still lack possibility proofs (which is what MIRI is working on). So this seems like a weak objection. If you want to say “future agents will have different reasoning frameworks than the ones we currently understand,” that’s well and good, but see below. (That seems a lot like discouraging someone from trying to develop logic because their proto-logic doesn’t resemble the way that humans actually reason.)

The argument I was trying to make was of the form:...

This is what I thought you meant; it seems analogous to:
- Creation process A [natural and cultural evolution] led to agents who don’t require a formalized deductive system
- By analogy, creation process C [people making AGI] will lead to agents who don’t require a formalized deductive system
- Therefore, it is not necessary to take special precautions to ensure that deductive systems are formalized.
Do you object to the analogy?

No one thinks that the world will be destroyed because people built AI’s that couldn’t handle the Lobian obstruction. That doesn’t seem like a sensible position, and I think Eliezer explicitly disavows it in the writeup. The point is that we have some frameworks for reasoning about reasoning. Those formalisms don’t capture reflective reasoning, i.e. they don’t provide a formal account of how reflective reasoning could work in principle. The problem Eliezer points to is an obvious problem that any consistent framework for reflective reasoning must resolve.

Working on this problem directly may be less productive than just trying to understand how reflective reasoning works in general—indeed, folks around here definitely try to understand how reflective reasoning works much more broadly, rather than focusing on this problem. The point of this post is to state a precise problem which existing techniques cannot resolve, because that is a common technique for making progress.
What links here?
- lukeprog's comment on Tiling Agents for Self-Modifying AI (OPFAI #2) by Eliezer Yudkowsky (10 Jan 2014 16:39 UTC; 3 points)
- lukeprog's comment on The Inefficiency of Theoretical Discovery by lukeprog (4 Nov 2013 21:40 UTC; 0 points)
- Nick_Beckstead 7 Jun 2013 20:03 UTC
  4 points
  Parent
  The opportunity to kill yourself in exchange for $10 is a prototypical case. It’s well and good to say “this is only a problem for an agent who uses proofs,” but that’s not a scenario, that’s a type of agent. Yes, real agents will probably not use mathematical logic in the naive way. But as I pointed out in response to Wei Dai, probabilism doesn’t make the issues go away. It softens the impossibility proofs, but we still lack possibility proofs (which is what MIRI is working on). So this seems like a weak objection.
  
  Thank you for the example. I do want to say “this is only a problem for an agent who uses proofs” if that’s indeed true. It sounds like you agree, but are saying that some analogous but more complicated problem might arise for probabilistic agents, and that it might not be resolved be whoever else is making AI unless this research is done by MIRI. If you have an example of a complication that you think would plausibly arise in practice and have further thoughts on why we shouldn’t expect this complication to be avoided by default in the course of the ordinary development of AI, I would be interested in hearing more. These do seem like crucial questions to me if we want to argue that this is an important line of research for the future of AI. Do you agree that these questions are crucial?
  
  Creation process A [natural and cultural evolution] led to agents who don’t require a formalized deductive system By analogy, creation process C [people making AGI] will lead to agents who don’t require a formalized deductive system Therefore, it is not necessary to take special precautions to ensure that deductive systems are formalized. Do you object to the analogy?
  
  I do object to this analogy, though I now have a better idea of where you are coming from. Here’s a stab at how the arguments are different (first thing that came to mind):
  - My argument says that if creation process A led to agents who overcome obstacle X to doing Z, then the ordinary development of AGI will lead to agents who overcome obstacle X to doing Z.
  - Your argument says that if creation process A led to agents who overcome obstacle X to doing Z in way Y, then the ordinary development of AGI will lead to agents who overcome obstacle X to doing Z in way Y.
  We might want to insert some qualifiers like “obstacle X needs to be essential to the proper functioning of the agent” or something along those lines, and other conditions I haven’t thought of may be relevant as well (often the case with analogies). But, basically, though I think the analogy suggests that the ordinary development of AI will overcome Lobian obstacles, I think it is much less supported that AGIs will overcome these obstacles in the same way as humans overcome them. Likewise, humans overcome obstacles to reasoning effectively in certain ways, and I don’t think there is much reason to suspect that AGIs will overcome these obstacles in the same ways. Therefore, I don’t think that the line of argument I was advancing supports the view that formalizing math and doing mathematical logic will be unhelpful in developing AI.
  
  No one thinks that the world will be destroyed because people built AI’s that couldn’t handle the Lobian obstruction. That doesn’t seem like a sensible position, and I think Eliezer explicitly disavows it in the writeup. The point is that we have some frameworks for reasoning about reasoning. Those formalisms don’t capture reflective reasoning, i.e. they don’t provide a formal account of how reflective reasoning could work in principle. The problem Eliezer points to is an obvious problem that any consistent framework for reflective reasoning must resolve.
  
  I think what you’re saying is that getting a good framework for reasoning about reasoning could be important for making AGI go well. This is plausible to me. And then you’re also saying that working on this Lobian stuff is a reasonable place to start. This is not obvious to me, but this seems like something that could be subtle, and I understand the position better now. I also don’t think that however you’re doing it should necessarily seem reasonable to me right now, even if it is.
  
  Big picture: the big questions I had about this were:
  - What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble?
  - Do you think that people building AGI in the future will stumble over Lob issues if MIRI doesn’t work on those issues? If so, why?
  I would now ask those questions differently:
  - What are some plausible concrete examples of cases where machines might fail to reason about self-modification properly if this research isn’t done? Why do you think it might happen in these cases?
  - Do you think that people building AGI in the future will fail to do this research, if it is in fact necessary for building well-functioning AIs? If so, why?
  I now have a better understanding of what your answer the first question might look like, though I’m still struggling to imagine what plausibly go wrong in practice if the research isn’t done. As far as I can tell, there hasn’t been any effort directed at addressing the second question in this specific context so far. Maybe that’s because it’s thought that it’s just part of the general question of whether future elites will handle AI development just fine. I’m not sure it is because it sounds like this may be part of making an AGI work at all, and the arguments I’ve heard for future elites not navigating it properly seems to turn on safety issues rather than basic functionality issues.
  - Eliezer Yudkowsky 7 Jun 2013 21:52 UTC
    13 points
    Parent
    
    It sounds like you agree, but are saying that some analogous but more complicated problem might arise for probabilistic agents, and that it might not be resolved be whoever else is making AI unless this research is done by MIRI.
    
    That’s not it, rather:
    
    I think what you’re saying is that getting a good framework for reasoning about reasoning could be important for making AGI go well. This is plausible to me. And then you’re also saying that working on this Lobian stuff is a reasonable place to start. This is not obvious to me, but this seems like something that could be subtle, and I understand the position better now.
    
    Yep. We have reasoning frameworks like the currently dominant forms of decision theory, but they don’t handle reflectivity well.
    
    The Lob Problem isn’t a top-priority scary thing that is carved upon the tombstones of worlds, it’s more like, “Look! We managed to crisply exhibit something very precise that would go wrong with standard methods and get started on analyzing and fixing it! Before we just saw in a more intuitive sense that something would go wrong when we applied standard theories to reflective problems but now we can state three problems very precisely!” (Lob and coherent quantified belief sec. 3, nonmonotonicity of probabilistic reasoning sec. 5.2 & 7, maximizing / satisficing not being good-enough idioms for bounded agents sec. 8.) Problems with reflectivity in general are expectedly carved upon the tombstones of worlds because they expectedly cause problems with goal stability during self-modification. But to make progress on that you need crisp problems to provide fodder for getting started on finding a good shape for a reflective decision theory / tiling self-improving agent.
    What links here?
    Start Under the Streetlight, then Push into the Shadows by lukeprog (24 Jun 2013 0:49 UTC; 52 points)
  - paulfchristiano 10 Jun 2013 13:04 UTC
    6 points
    Parent
    (As usual, I have somewhat less extreme views here than Eliezer.)
    
    saying that some analogous but more complicated problem might arise for probabilistic agents
    
    There is a problem here, we have an impossibility proof for a broad class of agents, and we know of no agents that don’t have the problem. Indeed, this limits the relevance of the impossibility proof, but it doesn’t limit the realness of the problem.
    
    If you have an example of a complication that you think would plausibly arise in practice and have further thoughts on why we shouldn’t expect this complication to be avoided by default in the course of the ordinary development of AI, I would be interested in hearing more.
    
    I don’t quite see where you are coming from here. It seems like the situation is:
    
    There are problems that reflective reasoners would be expected to solve, which we don’t understand how to resolve in current frameworks for general reasoning (of which mathematical logic is the strongest).
    If you think that reflective reasoning may be an important part of AGI, then having formal frameworks for reflective reasoning is an important part of having formal frameworks for AGI.
    If you think that having formal frameworks is likely to improve our understanding of AGI, then having formal frameworks that support reflective reasoning is a useful step towards improving our understanding of AGI.
    
    The sort of complication I imagine is: it is possible to build powerful AGI without having good frameworks for understanding its behavior, and then people do that. It seems like all things equal understanding a system is useful, not only for building it but also for having reasonable expectations about its behavior (which is in turn useful for making further preparations, solving safety problems, etc.). To the extent that understanding things at a deep level ends up being necessary to building them at all, then what we’re doing won’t matter (except insofar as people who care about safety making modest technical contributions is indirectly useful).
    
    Do you think that people building AGI in the future will fail to do this research, if it is in fact necessary for building well-functioning AIs? If so, why?
    
    Same answer. It may be that understanding reasoning well is necessary to building powerful agents (indeed, that would be my mode guess). But it may be that you can influence the relative development of understanding vs. building, in which case pushing on understanding has a predictable effect. For example, if people didn’t know what proofs or probabilities were, it isn’t out of the question that they could build deep belief nets by empirical experimentation. But I feel safe saying that understanding proof and probability helps you better reason about the behavior of extremely powerful deep belief nets.
    
    Here’s a stab at how the arguments are different (first thing that came to mind):
    
    I agree that the cases differ in many ways. But this distinction doesn’t seem to get at the important thing. To someone working on logic you would say “I don’t know whether deduction systems will be formalized in the future, but I know that agents will be able to reason. So this suggests to me that your particular approach for defining reasoning, via formalization, is unnecessary.” In some sense this is true—if I’m an early mathematician and I don’t do logic, someone else will—but it has relatively little bearing on whether logic is likely to be mathematically productive to work on. If the question is about impact rather than productivity as a research program, then see the discussion above.
    - Nick_Beckstead 10 Jun 2013 13:37 UTC
      0 points
      Parent
      
      The sort of complication I imagine is: it is possible to build powerful AGI without having good frameworks for understanding its behavior, and then people do that. It seems like all things equal understanding a system is useful, not only for building it but also for having reasonable expectations about its behavior (which is in turn useful for making further preparations, solving safety problems, etc.). To the extent that understanding things at a deep level ends up being necessary to building them at all, then what we’re doing won’t matter (except insofar as people who care about safety making modest technical contributions is indirectly useful).
      
      OK, helpful. This makes more sense to me.
      
      But this distinction doesn’t seem to get at the important thing. To someone working on logic you would say “I don’t know whether deduction systems will be formalized in the future, but I know that agents will be able to reason. So this suggests to me that your particular approach for defining reasoning, via formalization, is unnecessary.” In some sense this is true—if I’m an early mathematician and I don’t do logic, someone else will—but it has relatively little bearing on whether logic is likely to be mathematically productive to work on. If the question is about impact rather than productivity as a research program, then see the discussion above.
      
      This reply would make more sense if I was saying that knowing how to overcome Lobian obstacles would never be necessary for building well-functioning AI. But I was making the weaker claim that either it would never be necessary OR it would be solved in the ordinary development of AI. So if someone is formalizing logic a long time ago with the aim of building thinking machines AND they thought that when thinking machines were built logic wouldn’t be formalized properly and the machines wouldn’t work, then I might have complained. But if they had said, “I’d like to build a thinking machine and I think that formalizing logic will help get us there, whether it is done by others or me. And maybe it will go a bit better or come a bit sooner if I get involved. So I’m working on it.” then I wouldn’t have had anything to say.
      
      Anyway, I think we roughly understand each other on this thread of the conversation, so maybe there is no need to continue.