I am very glad to see MIRI taking steps to list open problems and explain why those problems are important for making machine intelligence benefit humanity.
I’m also struggling to see why this Lob problem is a reasonable problem to worry about right now (even within the space of possible AI problems). Basically, I’m skeptical that this difficulty or something similar to it will arise in practice. I’m not sure if you disagree, since you are saying you don’t think this difficulty will “block AI.” And if it isn’t going to arise in practice (or something similar to it), I’m not sure why this should be high on the priority list of general AI issues to think about it (edited to add: or why working on this problem now should be expected to help machine intelligence develop in a way that benefits humanity).
Some major questions I have are:
What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble? I promise not to interpret your answer as “Eliezer says this is probably going to happen.”
Do you think that people building AGI in the future will stumble over Lob issues if MIRI doesn’t work on those issues? If so, why?
Part of where I’m coming from on the first question is that Lobian issues only seem relevant to me if you want to argue that one set of fundamental epistemic standards is better than another, not for proving that other types of software and hardware alterations (such as building better arms, building faster computers, finding more efficient ways to compress your data, finding more efficient search algorithms, or even finding better mid-level statistical techniques) would result in more expected utility. But I would guess that once you have an agent operating with a minimally decent fundamental epistemic standards, you just can’t prove that altering the agent’s fundamental epistemic standards would result in an improvement. My intuition is that you can only do that when you have an inconsistent agent, and in that situation it’s unclear to me how Lobian issues apply.
Part of where I’m coming from on the second question is that evolutionary processes made humans who seem capable of overcoming putative Lobian obstacles to self-modification. See my other comment for more detail. The other part has to do with basic questions about whether people will adequately prepare for AI by default.
I think you are right that strategy work may be higher value. But I think you underestimate the extent to which (1) such goods are complements [granting for the moment the hypothesis that this kind of AI work is in fact useful], and (2) there is a realistic prospect of engaging in many such projects in parallel, and that getting each started is a bottleneck.
Part of where I’m coming from on the first question is that Lobian issues only seem relevant to me if you want to argue that one set of fundamental epistemic standards is better than another
As Wei Dai observed, you seem to be significantly understating the severity of the problem. We are investigating conditions under which an agent can believe that its own operation will lawfully lead to good outcomes, which is more or less necessary for reasonably intelligent, reasonably sane behavior given our current understanding.
Part of where I’m coming from on the second question is that evolutionary processes made humans who seem capable of overcoming putative Lobian obstacles to self-modification.
Compare to: “I’m not sure how relevant formalizing mathematical reasoning is, because evolution made humans who are pretty good at reasoning without any underlying rigidly defined formal systems.”
Is there an essential difference between these cases? Your objection is very common, but it looks like to me like it is on the wrong end of a very strong empirical regularity, i.e. it seems like you would argue against some of the most compelling historical developments in mathematics on similar grounds, while basically never ending up on the right side of an argument.
Similarly, you would discourage the person who advocates studying mathematical logic with the goal of building a thinking machine [which as far as I can tell was one of the original objects, before the program of formalization took off]. I do think we can predictably say that such research is worthwhile.
This is without even getting into MIRI’s raison d’etre, namely that it may be possible for societies to produce AI given widely varying levels of understanding of the underlying formal frameworks, and that all things equal we expect a deeper understanding of the underlying theory to result in better outcomes (according to the values of AI designers).
I think you underestimate the extent to which (1) such goods are complements [granting for the moment the hypothesis that this kind of AI work is in fact useful], and (2) there is a realistic prospect of engaging in many such projects in parallel, and that getting each started is a bottleneck.
This is an interesting point I wasn’t fully taking into consideration. As I said in another comment, where MIRI has the right kind of technical AI questions, it makes sense to write them up.
As Wei Dai observed, you seem to be significantly understating the severity of the problem. We are investigating conditions under which an agent can believe that its own operation will lawfully lead to good outcomes, which is more or less necessary for reasonably intelligent, reasonably sane behavior given our current understanding.
I think it would greatly help me understand the expected practical implications of this research if you could address the question I asked in the original comment: “What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble?” I think I get why it causes problems if, as Wei Dai said, the AI makes decisions purely based on proofs. I don’t see how the problem would be expected to arise in scenarios that seem more plausible. I think that the MIRI people working in this problem know a lot more about this than me, which is why I am asking for examples; I expect you have something to tell me that will make this make more sense.
Part of where I’m coming from on the second question is that evolutionary processes made humans who seem capable of overcoming putative Lobian obstacles to self-modification.
Compare to: “I’m not sure how relevant formalizing mathematical reasoning is, because evolution made humans who are pretty good at reasoning without any underlying rigidly defined formal systems.”
Is there an essential difference between these cases? Your objection is very common, but it looks like to me like it is on the wrong end of a very strong empirical regularity, i.e. it seems like you would argue against some of the most compelling historical developments in mathematics on similar grounds, while basically never ending up on the right side of an argument.
Similarly, you would discourage the person who advocates studying mathematical logic with the goal of building a thinking machine [which as far as I can tell was one of the original objects, before the program of formalization took off]. I do think we can predictably say that such research is worthwhile.
The argument I was trying to make was of the form:
Creation process A [natural and cultural evolution] led to agents who don’t stumble over problem B [Lobian issues].
By analogy, creation process C [people making AGI] will lead to agents who don’t stumble over problem B [Lobian issues], even if MIRI does not take special precautions to prevent this from happening.
Therefore, it is not necessary to take special precautions to make sure creation process C doesn’t stumble over problem B.
I don’t think this type of reasoning will lead to the conclusion that formalizing mathematics and doing mathematical logic are not worthwhile. Perhaps you interpreted my argument another way.
“What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble?”
The opportunity to kill yourself in exchange for $10 is a prototypical case. It’s well and good to say “this is only a problem for an agent who uses proofs,” but that’s not a scenario, that’s a type of agent. Yes, real agents will probably not use mathematical logic in the naive way. But as I pointed out in response to Wei Dai, probabilism doesn’t make the issues go away. It softens the impossibility proofs, but we still lack possibility proofs (which is what MIRI is working on). So this seems like a weak objection. If you want to say “future agents will have different reasoning frameworks than the ones we currently understand,” that’s well and good, but see below. (That seems a lot like discouraging someone from trying to develop logic because their proto-logic doesn’t resemble the way that humans actually reason.)
The argument I was trying to make was of the form:...
This is what I thought you meant; it seems analogous to:
Creation process A [natural and cultural evolution] led to agents who don’t require a formalized deductive system
By analogy, creation process C [people making AGI] will lead to agents who don’t require a formalized deductive system
Therefore, it is not necessary to take special precautions to ensure that deductive systems are formalized.
Do you object to the analogy?
No one thinks that the world will be destroyed because people built AI’s that couldn’t handle the Lobian obstruction. That doesn’t seem like a sensible position, and I think Eliezer explicitly disavows it in the writeup. The point is that we have some frameworks for reasoning about reasoning. Those formalisms don’t capture reflective reasoning, i.e. they don’t provide a formal account of how reflective reasoning could work in principle. The problem Eliezer points to is an obvious problem that any consistent framework for reflective reasoning must resolve.
Working on this problem directly may be less productive than just trying to understand how reflective reasoning works in general—indeed, folks around here definitely try to understand how reflective reasoning works much more broadly, rather than focusing on this problem. The point of this post is to state a precise problem which existing techniques cannot resolve, because that is a common technique for making progress.
The opportunity to kill yourself in exchange for $10 is a prototypical case. It’s well and good to say “this is only a problem for an agent who uses proofs,” but that’s not a scenario, that’s a type of agent. Yes, real agents will probably not use mathematical logic in the naive way. But as I pointed out in response to Wei Dai, probabilism doesn’t make the issues go away. It softens the impossibility proofs, but we still lack possibility proofs (which is what MIRI is working on). So this seems like a weak objection.
Thank you for the example. I do want to say “this is only a problem for an agent who uses proofs” if that’s indeed true. It sounds like you agree, but are saying that some analogous but more complicated problem might arise for probabilistic agents, and that it might not be resolved be whoever else is making AI unless this research is done by MIRI. If you have an example of a complication that you think would plausibly arise in practice and have further thoughts on why we shouldn’t expect this complication to be avoided by default in the course of the ordinary development of AI, I would be interested in hearing more. These do seem like crucial questions to me if we want to argue that this is an important line of research for the future of AI. Do you agree that these questions are crucial?
Creation process A [natural and cultural evolution] led to agents who don’t require a formalized deductive system
By analogy, creation process C [people making AGI] will lead to agents who don’t require a formalized deductive system
Therefore, it is not necessary to take special precautions to ensure that deductive systems are formalized.
Do you object to the analogy?
I do object to this analogy, though I now have a better idea of where you are coming from. Here’s a stab at how the arguments are different (first thing that came to mind):
My argument says that if creation process A led to agents who overcome obstacle X to doing Z, then the ordinary development of AGI will lead to agents who overcome obstacle X to doing Z.
Your argument says that if creation process A led to agents who overcome obstacle X to doing Z in way Y, then the ordinary development of AGI will lead to agents who overcome obstacle X to doing Z in way Y.
We might want to insert some qualifiers like “obstacle X needs to be essential to the proper functioning of the agent” or something along those lines, and other conditions I haven’t thought of may be relevant as well (often the case with analogies). But, basically, though I think the analogy suggests that the ordinary development of AI will overcome Lobian obstacles, I think it is much less supported that AGIs will overcome these obstacles in the same way as humans overcome them. Likewise, humans overcome obstacles to reasoning effectively in certain ways, and I don’t think there is much reason to suspect that AGIs will overcome these obstacles in the same ways. Therefore, I don’t think that the line of argument I was advancing supports the view that formalizing math and doing mathematical logic will be unhelpful in developing AI.
No one thinks that the world will be destroyed because people built AI’s that couldn’t handle the Lobian obstruction. That doesn’t seem like a sensible position, and I think Eliezer explicitly disavows it in the writeup. The point is that we have some frameworks for reasoning about reasoning. Those formalisms don’t capture reflective reasoning, i.e. they don’t provide a formal account of how reflective reasoning could work in principle. The problem Eliezer points to is an obvious problem that any consistent framework for reflective reasoning must resolve.
I think what you’re saying is that getting a good framework for reasoning about reasoning could be important for making AGI go well. This is plausible to me. And then you’re also saying that working on this Lobian stuff is a reasonable place to start. This is not obvious to me, but this seems like something that could be subtle, and I understand the position better now. I also don’t think that however you’re doing it should necessarily seem reasonable to me right now, even if it is.
Big picture: the big questions I had about this were:
What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble?
Do you think that people building AGI in the future will stumble over Lob issues if MIRI doesn’t work on those issues? If so, why?
I would now ask those questions differently:
What are some plausible concrete examples of cases where machines might fail to reason about self-modification properly if this research isn’t done? Why do you think it might happen in these cases?
Do you think that people building AGI in the future will fail to do this research, if it is in fact necessary for building well-functioning AIs? If so, why?
I now have a better understanding of what your answer the first question might look like, though I’m still struggling to imagine what plausibly go wrong in practice if the research isn’t done. As far as I can tell, there hasn’t been any effort directed at addressing the second question in this specific context so far. Maybe that’s because it’s thought that it’s just part of the general question of whether future elites will handle AI development just fine. I’m not sure it is because it sounds like this may be part of making an AGI work at all, and the arguments I’ve heard for future elites not navigating it properly seems to turn on safety issues rather than basic functionality issues.
It sounds like you agree, but are saying that some analogous but more complicated problem might arise for probabilistic agents, and that it might not be resolved be whoever else is making AI unless this research is done by MIRI.
That’s not it, rather:
I think what you’re saying is that getting a good framework for reasoning about reasoning could be important for making AGI go well. This is plausible to me. And then you’re also saying that working on this Lobian stuff is a reasonable place to start. This is not obvious to me, but this seems like something that could be subtle, and I understand the position better now.
Yep. We have reasoning frameworks like the currently dominant forms of decision theory, but they don’t handle reflectivity well.
The Lob Problem isn’t a top-priority scary thing that is carved upon the tombstones of worlds, it’s more like, “Look! We managed to crisply exhibit something very precise that would go wrong with standard methods and get started on analyzing and fixing it! Before we just saw in a more intuitive sense that something would go wrong when we applied standard theories to reflective problems but now we can state three problems very precisely!” (Lob and coherent quantified belief sec. 3, nonmonotonicity of probabilistic reasoning sec. 5.2 & 7, maximizing / satisficing not being good-enough idioms for bounded agents sec. 8.) Problems with reflectivity in general are expectedly carved upon the tombstones of worlds because they expectedly cause problems with goal stability during self-modification. But to make progress on that you need crisp problems to provide fodder for getting started on finding a good shape for a reflective decision theory / tiling self-improving agent.
(As usual, I have somewhat less extreme views here than Eliezer.)
saying that some analogous but more complicated problem might arise for probabilistic agents
There is a problem here, we have an impossibility proof for a broad class of agents, and we know of no agents that don’t have the problem. Indeed, this limits the relevance of the impossibility proof, but it doesn’t limit the realness of the problem.
If you have an example of a complication that you think would plausibly arise in practice and have further thoughts on why we shouldn’t expect this complication to be avoided by default in the course of the ordinary development of AI, I would be interested in hearing more.
I don’t quite see where you are coming from here. It seems like the situation is:
There are problems that reflective reasoners would be expected to solve, which we don’t understand how to resolve in current frameworks for general reasoning (of which mathematical logic is the strongest).
If you think that reflective reasoning may be an important part of AGI, then having formal frameworks for reflective reasoning is an important part of having formal frameworks for AGI.
If you think that having formal frameworks is likely to improve our understanding of AGI, then having formal frameworks that support reflective reasoning is a useful step towards improving our understanding of AGI.
The sort of complication I imagine is: it is possible to build powerful AGI without having good frameworks for understanding its behavior, and then people do that. It seems like all things equal understanding a system is useful, not only for building it but also for having reasonable expectations about its behavior (which is in turn useful for making further preparations, solving safety problems, etc.). To the extent that understanding things at a deep level ends up being necessary to building them at all, then what we’re doing won’t matter (except insofar as people who care about safety making modest technical contributions is indirectly useful).
Do you think that people building AGI in the future will fail to do this research, if it is in fact necessary for building well-functioning AIs? If so, why?
Same answer. It may be that understanding reasoning well is necessary to building powerful agents (indeed, that would be my mode guess). But it may be that you can influence the relative development of understanding vs. building, in which case pushing on understanding has a predictable effect. For example, if people didn’t know what proofs or probabilities were, it isn’t out of the question that they could build deep belief nets by empirical experimentation. But I feel safe saying that understanding proof and probability helps you better reason about the behavior of extremely powerful deep belief nets.
Here’s a stab at how the arguments are different (first thing that came to mind):
I agree that the cases differ in many ways. But this distinction doesn’t seem to get at the important thing. To someone working on logic you would say “I don’t know whether deduction systems will be formalized in the future, but I know that agents will be able to reason. So this suggests to me that your particular approach for defining reasoning, via formalization, is unnecessary.” In some sense this is true—if I’m an early mathematician and I don’t do logic, someone else will—but it has relatively little bearing on whether logic is likely to be mathematically productive to work on. If the question is about impact rather than productivity as a research program, then see the discussion above.
The sort of complication I imagine is: it is possible to build powerful AGI without having good frameworks for understanding its behavior, and then people do that. It seems like all things equal understanding a system is useful, not only for building it but also for having reasonable expectations about its behavior (which is in turn useful for making further preparations, solving safety problems, etc.). To the extent that understanding things at a deep level ends up being necessary to building them at all, then what we’re doing won’t matter (except insofar as people who care about safety making modest technical contributions is indirectly useful).
OK, helpful. This makes more sense to me.
But this distinction doesn’t seem to get at the important thing. To someone working on logic you would say “I don’t know whether deduction systems will be formalized in the future, but I know that agents will be able to reason. So this suggests to me that your particular approach for defining reasoning, via formalization, is unnecessary.” In some sense this is true—if I’m an early mathematician and I don’t do logic, someone else will—but it has relatively little bearing on whether logic is likely to be mathematically productive to work on. If the question is about impact rather than productivity as a research program, then see the discussion above.
This reply would make more sense if I was saying that knowing how to overcome Lobian obstacles would never be necessary for building well-functioning AI. But I was making the weaker claim that either it would never be necessary OR it would be solved in the ordinary development of AI. So if someone is formalizing logic a long time ago with the aim of building thinking machines AND they thought that when thinking machines were built logic wouldn’t be formalized properly and the machines wouldn’t work, then I might have complained. But if they had said, “I’d like to build a thinking machine and I think that formalizing logic will help get us there, whether it is done by others or me. And maybe it will go a bit better or come a bit sooner if I get involved. So I’m working on it.” then I wouldn’t have had anything to say.
Anyway, I think we roughly understand each other on this thread of the conversation, so maybe there is no need to continue.
I’m not sure why this should be high on the priority list of general AI issues to think about it.
I don’t think it’s the highest-priority issue to think about, but my impression is that among the issues that Eliezer has identified as worth thinking about, it could be the one closest to being completely mathematically formalized, so it’s a good one to focus on for the purpose of getting mathematicians interested in MIRI.
I do appreciate arguments in favor of focusing your effort on tractable problems, even if they are not the most important problems to solve.
It’s certainly hard to answer the question, “Why is this the best project to work on within AI?” since it implicitly requires comparisons will all types of stuff. It’s probably unreasonable to ask Eliezer to answer this question in a comment. However, it is reasonable to ask, “Why will this research help make machine intelligence develop in a way that will benefit humanity?” Most of the other questions in my comment are also relevant to that question.
I also question the importance of working on this problem now, but for a somewhat different reason.
Part of where I’m coming from on the first question is that Lobian issues only seem relevant to me if you want to argue that one set of fundamental epistemic standards is better than another
My understanding is that Lobian issues make it impossible for a proof-based AI to decide to not immediately commit suicide, because it can’t prove that it won’t do something worse than nothing in the future. (Let’s say it will have the option to blow up Earth in the future. Since it can’t prove that its own proof system is consistent, it can’t prove that it won’t prove that blowing up Earth maximizes utility at that future time.) To me this problem looks more like a problem with making decisions based purely on proofs, and not much related to self-modification.
Using probabilities instead of proofs seems to eliminate the old obstructions, but it does leave a sequence of challenging problems (hence the work on probabilistic reflection). E.g., we’ve proved that there is an algorithm P using a halting oracle such that:
(Property R): Intuitively, we “almost” have a < P(X | a < P(X) < b) < b. Formally:
For each sentence X, each a, and each b, P(X AND a<P(X)<b ) < b * P(a ⇐ P(X) ⇐ b).
For each sentence X, each a, and each b, P(X AND a<=P(X)<=b) > a * P(a < P(X) < b)
But this took a great deal of work, and we can’t exhibit any algorithm that simultaneously satisfies Property R and has P(Property R) = 1. Do you think this is not an important question? It seems to me that we don’t yet know how many of the Godelian obstructions carry in the probabilistic environment, and there are still real problems that will involve ingenuity to resolve.
Putting the dangers of AI progress aside, we probably ought to first work on understanding logical uncertainty in general, and start with simpler problems. I find it unlikely that we can solve “probabilistic reflection” (or even correctly specify what the problem is) when we don’t yet know what principles allow us to say that P!=NP is more likely to be true than false. Do we even know that using probabilities is the right way to handle logical uncertainty? (People assumed that using probabilities is the right way to handle indexical uncertainty and that turned out to be wrong.)
we don’t yet know what principles allow us to say that P!=NP is more likely to be true than false
We have coherent answers at least. See e.g. here for a formalism (and similarly the much older stuff by Gaifman, which didn’t get into priors). MIRI is working much more directly on this problem as well. Can you think of concrete open questions in that space? Basically we are just trying to develop the theory, but having simple concrete problems would surely be good. (I have a bucket of standard toy problems to resolve, and don’t have a good approach that handle all of them, but it’s pretty easy to hack together a solution to them so they don’t really count as open problems.)
I agree that AI progress is probably socially costly (highly positive for currently living folks, modestly negative for the average far future person). I think work with a theoretical bias is more likely to be helpful, and I don’t think it is very bad on net. Moreover, as long as safety-concerned folks are responsible for a very small share of all of the good AI work, the reputation impacts of doing good work seem very large compared to the social benefits or costs.
We don’t know that probabilities are the right way to handle logical uncertainty, nor that our problem statements are correct. I think that the kind of probabilistic reflection we are working on is fairly natural though.
I agree with both you and Nick that the strategic questions are very important, probably more important than the math. I don’t think that is inconsistent with getting the mathematical research program up and going. I would guess that all told the math will help on the strategy front via building the general credibility of AI safety concern (by 1. making it clear that there are concrete policy-relevant questions here, and 2. building status and credibility for safety-concerned communities and individuals), but even neglecting that I think it would still be worth it.
We have coherent answers at least. See e.g. here for a formalism (and similarly the much older stuff by Gaifman, which didn’t get into priors).
I read that paper before but it doesn’t say why its proposed way of handling logical uncertainty is the correct one, except that it “seem to have some good properties”. It seems like we’re still at a stage when we don’t understand logical uncertainty at a deep level and can offer solutions based on fundamental principles, but just trying out various ideas to see what sticks.
I agree that AI progress is probably socially costly [...] Moreover, as long as safety-concerned folks are responsible for a very small share of all of the good AI work, the reputation impacts of doing good work seem very large compared to the social benefits or costs.
I’m not entirely clear on your position. Are you saying that theoretical AI work by safety-concerned folks has a net social cost, accounting for reputation impacts, or excluding reputation impacts?
I think that the kind of probabilistic reflection we are working on is fairly natural though.
Maybe I’m just being dense but I’m still not really getting why you think that (despite your past attempts to explain it to me in conversation). The current paper doesn’t seem to make a strong attempt to explain it either.
I read that paper before but it doesn’t say why its proposed way of handling logical uncertainty is the correct one, except that it “seem to have some good properties”.
This is basically the same as the situation with respect to indexical probabilities. There are dominance arguments for betting odds etc. that don’t quite go through, but it seems like probabilities are still distinguished as a good best guess, and worth fleshing out. And if you accept probabilities prior specification is the clear next question.
I’m not entirely clear on your position. Are you saying that theoretical AI work by safety-concerned folks has a net social cost, accounting for reputation impacts, or excluding reputation impacts?
I think it’s plausible there are net social costs, excluding reputational impacts, and would certainly prefer to think more about it first. But with reputational impacts it seems like the case is relatively clear (of course this is potentially self-serving reasoning), and there are similar gains in terms of making things seem more concrete etc.
Maybe I’m just being dense but I’m still not really getting why you think that (despite your past attempts to explain it to me in conversation). The current paper doesn’t seem to make a strong attempt to explain it either.
Well, the first claim was that without the epsilons (i.e. with closed instead of open intervals) it would be exactly what you wanted (you would have an inner symbol that exactly corresponded to reality), and the second claim was that the epsilons aren’t so bad (e.g. because exact comparisons between floats are kind of silly anyway). Probably those could be more explicit in the writeup, but it would be helpful to know which steps seem shakiest.
Well, the first claim was that without the epsilons (i.e. with closed instead of open intervals) it would be exactly what you wanted (you would have an inner symbol that exactly corresponded to reality)
Why do you say “exactly corresponded to reality”? You’d have an inner symbol which corresponded to the outer P, but P must be more like subjective credence than external reality, since in reality each logical statement is presumably either true or false, not a probabilistic mixture of both?
Intuitively, what I’d want is a “math intuition module” which, if it was looking at a mathematical expression denoting the beliefs that a copy of itself would have after running for a longer period of time or having more memory, would assign high probability that those beliefs would better correspond to reality than its own current beliefs. This would in turn license the AI using this MIM to build a more powerful version of itself, or just to believe that “think more” is generally a good idea aside from opportunity costs. I understand that you are not trying to directly build such an MIM, just to do a possibility proof. But your formalism looks very different from my intuitive requirement, and I don’t understand what your intuitive requirement might be.
P is intended to be like objective reality, exactly analogously with the predicate “True.” So we can adjoin P as a symbol and the reflection principle as an axiom schema, and thereby obtain a more expressive language. Depending on architecture, this also may increase the agent’s ability to formulate or reason about hypotheses.
Statements without P’s in them, are indeed either true or false with probability 1. I agree it is a bit odd for statements with P in them to have probabilities, but I don’t see a strong argument it shouldn’t happen. In particular, it seems irrelevant to anything meaningful we would like to do with a truth predicate. In subsequent versions of this result, the probabilities have been removed and the core topological considerations exposed directly.
The relationship between a truth predicate and the kind of reasoning you discuss (a MIM that believes its own computations are trustworthy) is that truth is useful or perhaps necessary for defining the kind of correspondence that you want the MIM to accept, about a general relationship between the algorithm it is running and what is “true”. So having a notion of “truth” seems like the first step.
I would guess that all told the math will help on the strategy front via building the general credibility of AI safety concern
Also, by attracting thinkers who can initially only be attracted by crisp technical problems, but as they get involved, will turn their substantial brainpower toward the strategic questions as well.
For three additional reasons for MIRI to focus on math for now, see the bullet points under “strategic research will consume a minority of our research budget in 2013” in MIRI’s Strategy for 2013.
what principles allow us to say that P!=NP is more likely to be true than false
Maybe we use the same principle that allows me to say “I guess I left my wallet at home” after I fail to find the wallet in the most likely places it could be, like my pockets. In other words, maybe we do Bayesian updating about the location of the “true” proof or disproof, as we check some apriori likely locations (attempted proofs and disproofs) and fail to find it there. This idea is still very vague, but looks promising to me because it doesn’t assume logical omniscience, unlike Abram’s and Benja’s ideas...
Layman’s answer: we want to predict what some self-modifying AI will do, so we want a decision theory that can ask about the effect of adopting a new decision theory or related processes. (The paper’s issues could easily come up.) The one alternative I can see involves knowing in advance, as humans, how any modification that a super-intelligence could imagine will affect its goals. This seems like exactly what humans are bad at.
Speaking of, you say we “seem capable of overcoming putative Lobian obstacles to self-modification.” But when I think about CEV, this appears dubious. We can’t express exactly what ‘extrapolation’ means, save by imagining a utility function that may not exist. And without a better language for talking about goal stability, how would we even formalize that question? How could we formally ask if CEV is workable?
Another aspect of where I’m coming from is that there should be a high standard of proof for claiming that something is an important technical problem in future AI development because it seems so hard to predict what will and won’t be relevant for distant future technologies. My feeling is that paragraphs like this one, while relevant, don’t provide strong enough arguments to overcome the prior:
As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip. This means that statistical testing methods (e.g. an evolutionary algorithm’s evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals). Mathematical proofs have the property that they are as strong as their axioms and have no significant conditionally independent per-step failure probability if their axioms are semantically true, which suggests that something like mathematical reasoning may be appropriate for certain particular types of self-modification during some developmental stages.
I would greatly appreciate further elaboration on why this is the right problem to be working on right now.
Another aspect of where I’m coming from is that there should be a high standard of proof for claiming that something is an important technical problem in future AI development because it seems so hard to predict what will and won’t be relevant for distant future technologies.
On the other hand, trying to solve many things that have a significant probability of being important so that you’re likely to eventually solve something that actually is important as a result, seems like a better idea than not doing anything because you can’t prove that any particular sub-problem is important.
I agree with this principle but think my claims are consistent with it. Doing stuff other than “technical problems in the future of AI” is an alternative worth considering.
Another aspect of where I’m coming from is that there should be a high standard of proof for claiming that something is an important technical problem in future AI development because it seems so hard to predict what will and won’t be relevant for distant future technologies.
I disagree. I would go in the other direction: if it seems relatively plausible that something is relevant, then I’m happy to have someone working on it. This is why I am happy to have MIRI work on this and related problems (to the extent that I even attended one of the Lob workshops), even if I do not personally think they are likely to be relevant.
ETA: My main reason for believing this is that, among scientific fields, willingness to entertain such research programs seems to correlate with the overall health of the field.
My question was more of a request for information than a challenge; Eliezer could say some things that would make doing mathematics on the Lob problem look more promising to me. It seems likely to me that I am missing some important aspect of the situation.
If I’m not missing anything major, then I think that, within the realm of AI risk, general strategic work addressing questions like, “Will the world’s elites navigate the creation of AI just fine?” would be preferable. That’s just one example; I do not mean to be claiming that this is the best thing to do. As I said in another comment, it’s very hard to argue that one course is optimal.
For convenience: “SuperBenefit” = increasing the probability that advanced machine intelligence has a positive impact.
I agree that MIRI has a lot left to explain with respect to questions #2 and #3, but it’s easier to explain those issues when we’ve explained #1 already, and we’ve only just begun to do that with AI forecasting, IEM, and Tiling Agents.
Presumably the relevance of AI forecasting and IEM to SuperBenefit is clear already?
In contrast, it does seem like the relevance of the Tiling Agents work to SuperBenefit is unclear to many people, and that more explanation is needed there. Now that Tiling Agents has been published, Eliezer has begun to explain its relevance to SuperBenefit in various places on this page, but it will take a lot of trial and error for us to discover what is and isn’t clear to people.
As for question #3, we’ve also only just begun to address that issue in detail.
So, MIRI still has a lot of explaining to do, and we’re working on it. But allow me a brief reminder that this gap isn’t unique to MIRI at all. Arguing for the cost effectiveness of any particular intervention given the overwhelming importance of the far future is extremely complicated, whether it be donating to AMF, doing AI risk strategy, spreading rationality, or something else.
E.g. if somebody accepts the overwhelming importance of the far future and is donating to AMF, they have roughly as much explaining to do as MIRI does, if not more.
Presumably the relevance of AI forecasting and IEM to SuperBenefit is clear already?
Yes.
So, MIRI still has a lot of explaining to do, and we’re working on it. But allow me a brief reminder that this gap isn’t unique to MIRI at all. Arguing for the cost effectiveness of any particular intervention given the overwhelming importance of the far future is extremely complicated, whether it be donating to AMF, doing AI risk strategy, spreading rationality, or something else.
E.g. if somebody accepts the overwhelming importance of the far future and is donating to AMF, they have roughly as much explaining to do as MIRI does, if not more.
I basically agree with these comments, with a couple of qualifications.
I think it’s unique to MIRI in the sense that it makes sense for MIRI to be expected to explain how its research is going to accomplish its mission of making machine intelligence benefit humanity, whereas it doesn’t make sense for global health charities to be expected to explain why improving global health makes the far future go better. This means MIRI has an asymmetrically hard job, but I do think it’s a reasonable division of labor.
I think it makes sense for other people who care about the far future to evaluate how the other strategies you mentioned are expected to affect the far future, and try to find the best ones. There is an overwhelming amount of work to do.
I think it’s unique to MIRI in the sense that it makes sense for MIRI to be expected to explain how its research is going to accomplish its mission of making machine intelligence benefit humanity, whereas it doesn’t make sense for global health charities to be expected to explain why improving global health makes the far future go better.
Right. Very few charities are even claiming to be good for the far future. So there’s an asymmetry between MIRI and other charities w.r.t. responsibility to explain plausible effects on the far future. But among parties (including MIRI) who care principally about the far future and are trying to do something about it, there seems to be no such asymmetry — except for other reasons, e.g. asymmetry in resource use.
So, MIRI still has a lot of explaining to do, and we’re working on it. But allow me a brief reminder that this gap isn’t unique to MIRI at all. Arguing for the cost effectiveness of any particular intervention given the overwhelming importance of the far future is extremely complicated, whether it be donating to AMF, doing AI risk strategy, spreading rationality, or something else.
I agree with this. Typically people justify their research on other grounds than this, for instance by identifying an obstacle to progress and showing how their approach might overcome it in a way that previously tried approaches were not able to. My impression is that one reason for doing this is that it is typically much easier to communicate along these lines, because it brings the discourse towards much more familiar technical questions while still correlating well with progress more generally.
Note that under this paradigm, the main thing MIRI needs to do to justify their work is to explain why the Lob obstacle is insufficiently addressed by other approaches (for instance, statistical learning theory). I would actually be very interested in understanding the relationship of statistics to the Lob obstacle, so look forward to any writeup that might exist in the future.
I am very glad to see MIRI taking steps to list open problems and explain why those problems are important for making machine intelligence benefit humanity.
I’m also struggling to see why this Lob problem is a reasonable problem to worry about right now (even within the space of possible AI problems). Basically, I’m skeptical that this difficulty or something similar to it will arise in practice. I’m not sure if you disagree, since you are saying you don’t think this difficulty will “block AI.” And if it isn’t going to arise in practice (or something similar to it), I’m not sure why this should be high on the priority list of general AI issues to think about it (edited to add: or why working on this problem now should be expected to help machine intelligence develop in a way that benefits humanity).
Some major questions I have are:
What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble? I promise not to interpret your answer as “Eliezer says this is probably going to happen.”
Do you think that people building AGI in the future will stumble over Lob issues if MIRI doesn’t work on those issues? If so, why?
Part of where I’m coming from on the first question is that Lobian issues only seem relevant to me if you want to argue that one set of fundamental epistemic standards is better than another, not for proving that other types of software and hardware alterations (such as building better arms, building faster computers, finding more efficient ways to compress your data, finding more efficient search algorithms, or even finding better mid-level statistical techniques) would result in more expected utility. But I would guess that once you have an agent operating with a minimally decent fundamental epistemic standards, you just can’t prove that altering the agent’s fundamental epistemic standards would result in an improvement. My intuition is that you can only do that when you have an inconsistent agent, and in that situation it’s unclear to me how Lobian issues apply.
Part of where I’m coming from on the second question is that evolutionary processes made humans who seem capable of overcoming putative Lobian obstacles to self-modification. See my other comment for more detail. The other part has to do with basic questions about whether people will adequately prepare for AI by default.
I think you are right that strategy work may be higher value. But I think you underestimate the extent to which (1) such goods are complements [granting for the moment the hypothesis that this kind of AI work is in fact useful], and (2) there is a realistic prospect of engaging in many such projects in parallel, and that getting each started is a bottleneck.
As Wei Dai observed, you seem to be significantly understating the severity of the problem. We are investigating conditions under which an agent can believe that its own operation will lawfully lead to good outcomes, which is more or less necessary for reasonably intelligent, reasonably sane behavior given our current understanding.
Compare to: “I’m not sure how relevant formalizing mathematical reasoning is, because evolution made humans who are pretty good at reasoning without any underlying rigidly defined formal systems.”
Is there an essential difference between these cases? Your objection is very common, but it looks like to me like it is on the wrong end of a very strong empirical regularity, i.e. it seems like you would argue against some of the most compelling historical developments in mathematics on similar grounds, while basically never ending up on the right side of an argument.
Similarly, you would discourage the person who advocates studying mathematical logic with the goal of building a thinking machine [which as far as I can tell was one of the original objects, before the program of formalization took off]. I do think we can predictably say that such research is worthwhile.
This is without even getting into MIRI’s raison d’etre, namely that it may be possible for societies to produce AI given widely varying levels of understanding of the underlying formal frameworks, and that all things equal we expect a deeper understanding of the underlying theory to result in better outcomes (according to the values of AI designers).
This is an interesting point I wasn’t fully taking into consideration. As I said in another comment, where MIRI has the right kind of technical AI questions, it makes sense to write them up.
I think it would greatly help me understand the expected practical implications of this research if you could address the question I asked in the original comment: “What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble?” I think I get why it causes problems if, as Wei Dai said, the AI makes decisions purely based on proofs. I don’t see how the problem would be expected to arise in scenarios that seem more plausible. I think that the MIRI people working in this problem know a lot more about this than me, which is why I am asking for examples; I expect you have something to tell me that will make this make more sense.
The argument I was trying to make was of the form:
Creation process A [natural and cultural evolution] led to agents who don’t stumble over problem B [Lobian issues].
By analogy, creation process C [people making AGI] will lead to agents who don’t stumble over problem B [Lobian issues], even if MIRI does not take special precautions to prevent this from happening.
Therefore, it is not necessary to take special precautions to make sure creation process C doesn’t stumble over problem B.
I don’t think this type of reasoning will lead to the conclusion that formalizing mathematics and doing mathematical logic are not worthwhile. Perhaps you interpreted my argument another way.
The opportunity to kill yourself in exchange for $10 is a prototypical case. It’s well and good to say “this is only a problem for an agent who uses proofs,” but that’s not a scenario, that’s a type of agent. Yes, real agents will probably not use mathematical logic in the naive way. But as I pointed out in response to Wei Dai, probabilism doesn’t make the issues go away. It softens the impossibility proofs, but we still lack possibility proofs (which is what MIRI is working on). So this seems like a weak objection. If you want to say “future agents will have different reasoning frameworks than the ones we currently understand,” that’s well and good, but see below. (That seems a lot like discouraging someone from trying to develop logic because their proto-logic doesn’t resemble the way that humans actually reason.)
This is what I thought you meant; it seems analogous to:
Creation process A [natural and cultural evolution] led to agents who don’t require a formalized deductive system
By analogy, creation process C [people making AGI] will lead to agents who don’t require a formalized deductive system
Therefore, it is not necessary to take special precautions to ensure that deductive systems are formalized.
Do you object to the analogy?
No one thinks that the world will be destroyed because people built AI’s that couldn’t handle the Lobian obstruction. That doesn’t seem like a sensible position, and I think Eliezer explicitly disavows it in the writeup. The point is that we have some frameworks for reasoning about reasoning. Those formalisms don’t capture reflective reasoning, i.e. they don’t provide a formal account of how reflective reasoning could work in principle. The problem Eliezer points to is an obvious problem that any consistent framework for reflective reasoning must resolve.
Working on this problem directly may be less productive than just trying to understand how reflective reasoning works in general—indeed, folks around here definitely try to understand how reflective reasoning works much more broadly, rather than focusing on this problem. The point of this post is to state a precise problem which existing techniques cannot resolve, because that is a common technique for making progress.
Thank you for the example. I do want to say “this is only a problem for an agent who uses proofs” if that’s indeed true. It sounds like you agree, but are saying that some analogous but more complicated problem might arise for probabilistic agents, and that it might not be resolved be whoever else is making AI unless this research is done by MIRI. If you have an example of a complication that you think would plausibly arise in practice and have further thoughts on why we shouldn’t expect this complication to be avoided by default in the course of the ordinary development of AI, I would be interested in hearing more. These do seem like crucial questions to me if we want to argue that this is an important line of research for the future of AI. Do you agree that these questions are crucial?
I do object to this analogy, though I now have a better idea of where you are coming from. Here’s a stab at how the arguments are different (first thing that came to mind):
My argument says that if creation process A led to agents who overcome obstacle X to doing Z, then the ordinary development of AGI will lead to agents who overcome obstacle X to doing Z.
Your argument says that if creation process A led to agents who overcome obstacle X to doing Z in way Y, then the ordinary development of AGI will lead to agents who overcome obstacle X to doing Z in way Y.
We might want to insert some qualifiers like “obstacle X needs to be essential to the proper functioning of the agent” or something along those lines, and other conditions I haven’t thought of may be relevant as well (often the case with analogies). But, basically, though I think the analogy suggests that the ordinary development of AI will overcome Lobian obstacles, I think it is much less supported that AGIs will overcome these obstacles in the same way as humans overcome them. Likewise, humans overcome obstacles to reasoning effectively in certain ways, and I don’t think there is much reason to suspect that AGIs will overcome these obstacles in the same ways. Therefore, I don’t think that the line of argument I was advancing supports the view that formalizing math and doing mathematical logic will be unhelpful in developing AI.
I think what you’re saying is that getting a good framework for reasoning about reasoning could be important for making AGI go well. This is plausible to me. And then you’re also saying that working on this Lobian stuff is a reasonable place to start. This is not obvious to me, but this seems like something that could be subtle, and I understand the position better now. I also don’t think that however you’re doing it should necessarily seem reasonable to me right now, even if it is.
Big picture: the big questions I had about this were:
What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble?
Do you think that people building AGI in the future will stumble over Lob issues if MIRI doesn’t work on those issues? If so, why?
I would now ask those questions differently:
What are some plausible concrete examples of cases where machines might fail to reason about self-modification properly if this research isn’t done? Why do you think it might happen in these cases?
Do you think that people building AGI in the future will fail to do this research, if it is in fact necessary for building well-functioning AIs? If so, why?
I now have a better understanding of what your answer the first question might look like, though I’m still struggling to imagine what plausibly go wrong in practice if the research isn’t done. As far as I can tell, there hasn’t been any effort directed at addressing the second question in this specific context so far. Maybe that’s because it’s thought that it’s just part of the general question of whether future elites will handle AI development just fine. I’m not sure it is because it sounds like this may be part of making an AGI work at all, and the arguments I’ve heard for future elites not navigating it properly seems to turn on safety issues rather than basic functionality issues.
That’s not it, rather:
Yep. We have reasoning frameworks like the currently dominant forms of decision theory, but they don’t handle reflectivity well.
The Lob Problem isn’t a top-priority scary thing that is carved upon the tombstones of worlds, it’s more like, “Look! We managed to crisply exhibit something very precise that would go wrong with standard methods and get started on analyzing and fixing it! Before we just saw in a more intuitive sense that something would go wrong when we applied standard theories to reflective problems but now we can state three problems very precisely!” (Lob and coherent quantified belief sec. 3, nonmonotonicity of probabilistic reasoning sec. 5.2 & 7, maximizing / satisficing not being good-enough idioms for bounded agents sec. 8.) Problems with reflectivity in general are expectedly carved upon the tombstones of worlds because they expectedly cause problems with goal stability during self-modification. But to make progress on that you need crisp problems to provide fodder for getting started on finding a good shape for a reflective decision theory / tiling self-improving agent.
(As usual, I have somewhat less extreme views here than Eliezer.)
There is a problem here, we have an impossibility proof for a broad class of agents, and we know of no agents that don’t have the problem. Indeed, this limits the relevance of the impossibility proof, but it doesn’t limit the realness of the problem.
I don’t quite see where you are coming from here. It seems like the situation is:
There are problems that reflective reasoners would be expected to solve, which we don’t understand how to resolve in current frameworks for general reasoning (of which mathematical logic is the strongest).
If you think that reflective reasoning may be an important part of AGI, then having formal frameworks for reflective reasoning is an important part of having formal frameworks for AGI.
If you think that having formal frameworks is likely to improve our understanding of AGI, then having formal frameworks that support reflective reasoning is a useful step towards improving our understanding of AGI.
The sort of complication I imagine is: it is possible to build powerful AGI without having good frameworks for understanding its behavior, and then people do that. It seems like all things equal understanding a system is useful, not only for building it but also for having reasonable expectations about its behavior (which is in turn useful for making further preparations, solving safety problems, etc.). To the extent that understanding things at a deep level ends up being necessary to building them at all, then what we’re doing won’t matter (except insofar as people who care about safety making modest technical contributions is indirectly useful).
Same answer. It may be that understanding reasoning well is necessary to building powerful agents (indeed, that would be my mode guess). But it may be that you can influence the relative development of understanding vs. building, in which case pushing on understanding has a predictable effect. For example, if people didn’t know what proofs or probabilities were, it isn’t out of the question that they could build deep belief nets by empirical experimentation. But I feel safe saying that understanding proof and probability helps you better reason about the behavior of extremely powerful deep belief nets.
I agree that the cases differ in many ways. But this distinction doesn’t seem to get at the important thing. To someone working on logic you would say “I don’t know whether deduction systems will be formalized in the future, but I know that agents will be able to reason. So this suggests to me that your particular approach for defining reasoning, via formalization, is unnecessary.” In some sense this is true—if I’m an early mathematician and I don’t do logic, someone else will—but it has relatively little bearing on whether logic is likely to be mathematically productive to work on. If the question is about impact rather than productivity as a research program, then see the discussion above.
OK, helpful. This makes more sense to me.
This reply would make more sense if I was saying that knowing how to overcome Lobian obstacles would never be necessary for building well-functioning AI. But I was making the weaker claim that either it would never be necessary OR it would be solved in the ordinary development of AI. So if someone is formalizing logic a long time ago with the aim of building thinking machines AND they thought that when thinking machines were built logic wouldn’t be formalized properly and the machines wouldn’t work, then I might have complained. But if they had said, “I’d like to build a thinking machine and I think that formalizing logic will help get us there, whether it is done by others or me. And maybe it will go a bit better or come a bit sooner if I get involved. So I’m working on it.” then I wouldn’t have had anything to say.
Anyway, I think we roughly understand each other on this thread of the conversation, so maybe there is no need to continue.
I don’t think it’s the highest-priority issue to think about, but my impression is that among the issues that Eliezer has identified as worth thinking about, it could be the one closest to being completely mathematically formalized, so it’s a good one to focus on for the purpose of getting mathematicians interested in MIRI.
I do appreciate arguments in favor of focusing your effort on tractable problems, even if they are not the most important problems to solve.
It’s certainly hard to answer the question, “Why is this the best project to work on within AI?” since it implicitly requires comparisons will all types of stuff. It’s probably unreasonable to ask Eliezer to answer this question in a comment. However, it is reasonable to ask, “Why will this research help make machine intelligence develop in a way that will benefit humanity?” Most of the other questions in my comment are also relevant to that question.
I also question the importance of working on this problem now, but for a somewhat different reason.
My understanding is that Lobian issues make it impossible for a proof-based AI to decide to not immediately commit suicide, because it can’t prove that it won’t do something worse than nothing in the future. (Let’s say it will have the option to blow up Earth in the future. Since it can’t prove that its own proof system is consistent, it can’t prove that it won’t prove that blowing up Earth maximizes utility at that future time.) To me this problem looks more like a problem with making decisions based purely on proofs, and not much related to self-modification.
Using probabilities instead of proofs seems to eliminate the old obstructions, but it does leave a sequence of challenging problems (hence the work on probabilistic reflection). E.g., we’ve proved that there is an algorithm P using a halting oracle such that:
(Property R): Intuitively, we “almost” have a < P(X | a < P(X) < b) < b. Formally:
For each sentence X, each a, and each b, P(X AND a<P(X)<b ) < b * P(a ⇐ P(X) ⇐ b).
For each sentence X, each a, and each b, P(X AND a<=P(X)<=b) > a * P(a < P(X) < b)
But this took a great deal of work, and we can’t exhibit any algorithm that simultaneously satisfies Property R and has P(Property R) = 1. Do you think this is not an important question? It seems to me that we don’t yet know how many of the Godelian obstructions carry in the probabilistic environment, and there are still real problems that will involve ingenuity to resolve.
Putting the dangers of AI progress aside, we probably ought to first work on understanding logical uncertainty in general, and start with simpler problems. I find it unlikely that we can solve “probabilistic reflection” (or even correctly specify what the problem is) when we don’t yet know what principles allow us to say that P!=NP is more likely to be true than false. Do we even know that using probabilities is the right way to handle logical uncertainty? (People assumed that using probabilities is the right way to handle indexical uncertainty and that turned out to be wrong.)
We have coherent answers at least. See e.g. here for a formalism (and similarly the much older stuff by Gaifman, which didn’t get into priors). MIRI is working much more directly on this problem as well. Can you think of concrete open questions in that space? Basically we are just trying to develop the theory, but having simple concrete problems would surely be good. (I have a bucket of standard toy problems to resolve, and don’t have a good approach that handle all of them, but it’s pretty easy to hack together a solution to them so they don’t really count as open problems.)
I agree that AI progress is probably socially costly (highly positive for currently living folks, modestly negative for the average far future person). I think work with a theoretical bias is more likely to be helpful, and I don’t think it is very bad on net. Moreover, as long as safety-concerned folks are responsible for a very small share of all of the good AI work, the reputation impacts of doing good work seem very large compared to the social benefits or costs.
We don’t know that probabilities are the right way to handle logical uncertainty, nor that our problem statements are correct. I think that the kind of probabilistic reflection we are working on is fairly natural though.
I agree with both you and Nick that the strategic questions are very important, probably more important than the math. I don’t think that is inconsistent with getting the mathematical research program up and going. I would guess that all told the math will help on the strategy front via building the general credibility of AI safety concern (by 1. making it clear that there are concrete policy-relevant questions here, and 2. building status and credibility for safety-concerned communities and individuals), but even neglecting that I think it would still be worth it.
I read that paper before but it doesn’t say why its proposed way of handling logical uncertainty is the correct one, except that it “seem to have some good properties”. It seems like we’re still at a stage when we don’t understand logical uncertainty at a deep level and can offer solutions based on fundamental principles, but just trying out various ideas to see what sticks.
I’m not entirely clear on your position. Are you saying that theoretical AI work by safety-concerned folks has a net social cost, accounting for reputation impacts, or excluding reputation impacts?
Maybe I’m just being dense but I’m still not really getting why you think that (despite your past attempts to explain it to me in conversation). The current paper doesn’t seem to make a strong attempt to explain it either.
This is basically the same as the situation with respect to indexical probabilities. There are dominance arguments for betting odds etc. that don’t quite go through, but it seems like probabilities are still distinguished as a good best guess, and worth fleshing out. And if you accept probabilities prior specification is the clear next question.
I think it’s plausible there are net social costs, excluding reputational impacts, and would certainly prefer to think more about it first. But with reputational impacts it seems like the case is relatively clear (of course this is potentially self-serving reasoning), and there are similar gains in terms of making things seem more concrete etc.
Well, the first claim was that without the epsilons (i.e. with closed instead of open intervals) it would be exactly what you wanted (you would have an inner symbol that exactly corresponded to reality), and the second claim was that the epsilons aren’t so bad (e.g. because exact comparisons between floats are kind of silly anyway). Probably those could be more explicit in the writeup, but it would be helpful to know which steps seem shakiest.
Why do you say “exactly corresponded to reality”? You’d have an inner symbol which corresponded to the outer P, but P must be more like subjective credence than external reality, since in reality each logical statement is presumably either true or false, not a probabilistic mixture of both?
Intuitively, what I’d want is a “math intuition module” which, if it was looking at a mathematical expression denoting the beliefs that a copy of itself would have after running for a longer period of time or having more memory, would assign high probability that those beliefs would better correspond to reality than its own current beliefs. This would in turn license the AI using this MIM to build a more powerful version of itself, or just to believe that “think more” is generally a good idea aside from opportunity costs. I understand that you are not trying to directly build such an MIM, just to do a possibility proof. But your formalism looks very different from my intuitive requirement, and I don’t understand what your intuitive requirement might be.
P is intended to be like objective reality, exactly analogously with the predicate “True.” So we can adjoin P as a symbol and the reflection principle as an axiom schema, and thereby obtain a more expressive language. Depending on architecture, this also may increase the agent’s ability to formulate or reason about hypotheses.
Statements without P’s in them, are indeed either true or false with probability 1. I agree it is a bit odd for statements with P in them to have probabilities, but I don’t see a strong argument it shouldn’t happen. In particular, it seems irrelevant to anything meaningful we would like to do with a truth predicate. In subsequent versions of this result, the probabilities have been removed and the core topological considerations exposed directly.
The relationship between a truth predicate and the kind of reasoning you discuss (a MIM that believes its own computations are trustworthy) is that truth is useful or perhaps necessary for defining the kind of correspondence that you want the MIM to accept, about a general relationship between the algorithm it is running and what is “true”. So having a notion of “truth” seems like the first step.
Also, by attracting thinkers who can initially only be attracted by crisp technical problems, but as they get involved, will turn their substantial brainpower toward the strategic questions as well.
For three additional reasons for MIRI to focus on math for now, see the bullet points under “strategic research will consume a minority of our research budget in 2013” in MIRI’s Strategy for 2013.
Maybe we use the same principle that allows me to say “I guess I left my wallet at home” after I fail to find the wallet in the most likely places it could be, like my pockets. In other words, maybe we do Bayesian updating about the location of the “true” proof or disproof, as we check some apriori likely locations (attempted proofs and disproofs) and fail to find it there. This idea is still very vague, but looks promising to me because it doesn’t assume logical omniscience, unlike Abram’s and Benja’s ideas...
I think I was implicitly assuming that you wouldn’t have an agent making decisions based purely on proofs.
Layman’s answer: we want to predict what some self-modifying AI will do, so we want a decision theory that can ask about the effect of adopting a new decision theory or related processes. (The paper’s issues could easily come up.) The one alternative I can see involves knowing in advance, as humans, how any modification that a super-intelligence could imagine will affect its goals. This seems like exactly what humans are bad at.
Speaking of, you say we “seem capable of overcoming putative Lobian obstacles to self-modification.” But when I think about CEV, this appears dubious. We can’t express exactly what ‘extrapolation’ means, save by imagining a utility function that may not exist. And without a better language for talking about goal stability, how would we even formalize that question? How could we formally ask if CEV is workable?
Another aspect of where I’m coming from is that there should be a high standard of proof for claiming that something is an important technical problem in future AI development because it seems so hard to predict what will and won’t be relevant for distant future technologies. My feeling is that paragraphs like this one, while relevant, don’t provide strong enough arguments to overcome the prior:
I would greatly appreciate further elaboration on why this is the right problem to be working on right now.
On the other hand, trying to solve many things that have a significant probability of being important so that you’re likely to eventually solve something that actually is important as a result, seems like a better idea than not doing anything because you can’t prove that any particular sub-problem is important.
I agree with this principle but think my claims are consistent with it. Doing stuff other than “technical problems in the future of AI” is an alternative worth considering.
I disagree. I would go in the other direction: if it seems relatively plausible that something is relevant, then I’m happy to have someone working on it. This is why I am happy to have MIRI work on this and related problems (to the extent that I even attended one of the Lob workshops), even if I do not personally think they are likely to be relevant.
ETA: My main reason for believing this is that, among scientific fields, willingness to entertain such research programs seems to correlate with the overall health of the field.
Are there other problems that you think it would be better to be working on now?
My question was more of a request for information than a challenge; Eliezer could say some things that would make doing mathematics on the Lob problem look more promising to me. It seems likely to me that I am missing some important aspect of the situation.
If I’m not missing anything major, then I think that, within the realm of AI risk, general strategic work addressing questions like, “Will the world’s elites navigate the creation of AI just fine?” would be preferable. That’s just one example; I do not mean to be claiming that this is the best thing to do. As I said in another comment, it’s very hard to argue that one course is optimal.
Thanks, that, at least for me, provides more context for your questions.
All that said, I do think that where MIRI has technical FAI questions to work on now, I think it is a very reasonable to write up:
what the question is
why answering the question is important for making machine intelligence benefit humanity
why we shouldn’t expect the question to be answered by default by whoever makes AGI
In this particular case, I am asking for more info about the second two questions.
For convenience: “SuperBenefit” = increasing the probability that advanced machine intelligence has a positive impact.
I agree that MIRI has a lot left to explain with respect to questions #2 and #3, but it’s easier to explain those issues when we’ve explained #1 already, and we’ve only just begun to do that with AI forecasting, IEM, and Tiling Agents.
Presumably the relevance of AI forecasting and IEM to SuperBenefit is clear already?
In contrast, it does seem like the relevance of the Tiling Agents work to SuperBenefit is unclear to many people, and that more explanation is needed there. Now that Tiling Agents has been published, Eliezer has begun to explain its relevance to SuperBenefit in various places on this page, but it will take a lot of trial and error for us to discover what is and isn’t clear to people.
As for question #3, we’ve also only just begun to address that issue in detail.
So, MIRI still has a lot of explaining to do, and we’re working on it. But allow me a brief reminder that this gap isn’t unique to MIRI at all. Arguing for the cost effectiveness of any particular intervention given the overwhelming importance of the far future is extremely complicated, whether it be donating to AMF, doing AI risk strategy, spreading rationality, or something else.
E.g. if somebody accepts the overwhelming importance of the far future and is donating to AMF, they have roughly as much explaining to do as MIRI does, if not more.
Yes.
I basically agree with these comments, with a couple of qualifications.
I think it’s unique to MIRI in the sense that it makes sense for MIRI to be expected to explain how its research is going to accomplish its mission of making machine intelligence benefit humanity, whereas it doesn’t make sense for global health charities to be expected to explain why improving global health makes the far future go better. This means MIRI has an asymmetrically hard job, but I do think it’s a reasonable division of labor.
I think it makes sense for other people who care about the far future to evaluate how the other strategies you mentioned are expected to affect the far future, and try to find the best ones. There is an overwhelming amount of work to do.
Right. Very few charities are even claiming to be good for the far future. So there’s an asymmetry between MIRI and other charities w.r.t. responsibility to explain plausible effects on the far future. But among parties (including MIRI) who care principally about the far future and are trying to do something about it, there seems to be no such asymmetry — except for other reasons, e.g. asymmetry in resource use.
Yes.
I agree with this. Typically people justify their research on other grounds than this, for instance by identifying an obstacle to progress and showing how their approach might overcome it in a way that previously tried approaches were not able to. My impression is that one reason for doing this is that it is typically much easier to communicate along these lines, because it brings the discourse towards much more familiar technical questions while still correlating well with progress more generally.
Note that under this paradigm, the main thing MIRI needs to do to justify their work is to explain why the Lob obstacle is insufficiently addressed by other approaches (for instance, statistical learning theory). I would actually be very interested in understanding the relationship of statistics to the Lob obstacle, so look forward to any writeup that might exist in the future.