The Inefficiency of Theoretical Discovery
Previously: Why Neglect Big Topics.
Why was there no serious philosophical discussion of normative uncertainty until 1989, given that all the necessary ideas and tools were present at the time of Jeremy Bentham?
Why did no professional philosopher analyze I.J. Good’s important “intelligence explosion” thesis (from 19591) until 2010?
Why was reflectively consistent probabilistic metamathematics not described until 2013, given that the ideas it builds on go back at least to the 1940s?
Why did it take until 2003 for professional philosophers to begin updating causal decision theory for the age of causal Bayes nets, and until 2013 to formulate a reliabilist metatheory of rationality?
By analogy to financial market efficiency, I like to say that “theoretical discovery is fairly inefficient.” That is: there are often large, unnecessary delays in theoretical discovery.
This shouldn’t surprise us. For one thing, there aren’t necessarily large personal rewards for making theoretical progress. But it does mean that those who do care about certain kinds of theoretical progress shouldn’t necessarily think that progress will be hard. There is often low-hanging fruit to be plucked by investigators who know where to look.
Where should we look for low-hanging fruit? I’d guess that theoretical progress may be relatively easy where:
Progress has no obvious, immediately profitable applications.
Relatively few quality-adjusted researcher hours have been devoted to the problem.
New tools or theoretical advances open up promising new angles of attack.
Progress is only valuable to those with unusual views.
These guesses make sense of the abundant low-hanging fruit in much of MIRI’s theoretical research, with the glaring exception of decision theory. Our September decision theory workshop revealed plenty of low-hanging fruit, but why should that be? Decision theory is widely applied in multi-agent systems, and in philosophy it’s clear that visible progress in decision theory is one way to “make a name” for oneself and advance one’s career. Tons of quality-adjusted researcher hours have been devoted to the problem. Yes, new theoretical advances (e.g. causal Bayes nets and program equilibrium) open up promising new angles of attack, but they don’t seem necessary to much of the low-hanging fruit discovered thus far. And progress in decision theory is definitely not valuable only to those with unusual views. What gives?
Anyway, three questions:
Do you agree about the relative inefficiency of theoretical discovery?
What are some other signs of likely low-hanging fruit for theoretical progress?
What’s up with decision theory having so much low-hanging fruit?
1 Good (1959) is the earliest statement of the intelligence explosion: “Once a machine is designed that is good enough… it can be put to work designing an even better machine. At this point an ”explosion“ will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.” The term itself, “intelligence explosion,” originates with Good (1965). Technically, artist and philosopher Stefan Themerson wrote a “philosophical analysis” of Good’s intelligence explosion thesis called Special Branch, published in 1972, but by “philosophical analysis” I have in mind a more analytic, argumentative kind of philosophical analysis than is found in Themerson’s literary Special Branch. ↩
I vastly disagree. I will just state it for now, and hopefully this will be a commitment to explain it further when I have the time. Here are my unjustified assertions about the nature of philosophy regarding OP’s topics:
Philosophy has the most huge search space known to man, it encompasses everything (a) without a good clear-cut solution and (b) which has any hope to be solved (this rules out two extremes: science and religion).
Philosophy, by its very nature, has few systematized methods for efficient search. Seems like we discovered logical and clear thinking recently, but that’s almost about it.
Because it is so difficult, philosophy is wrong 99,9% of the time.
When philosophy is right, major breakthroughs are made, sciences are created, new reasoning tools, higher moral standards and so on.
There’s a massive and astronomical hindsight bias. Once solved a problem is no longer on the realm of philosophy and the solution tend to seem extremely obvious after 1 or 2 generations.
Thus, low hanging fruits in philosophy are nowhere to be found. Most of your examples were already found, they just need to be worked on. I chalenge you to present a yet unknown low hanging fruit, one all your peers don’t know it already, one which would knock Nick’s socks off.
I will second this. It’s not that the process of theoretical discovery is inefficient due to any fault of its own, it’s that the problem is intractable (e.g. we don’t know how to do better than exhaustive search). So that linear looking search path from concept A to concept B did not take linear time to find...
I didn’t focus my post on this, but since you object, I’ll attempt to support my claim that “there are often large, unnecessary delays in theoretical discovery.” What I’m saying is that we can do this search in less time than we typically do.
That shouldn’t be surprising: there are all kinds of inefficiencies in theoretical discovery. Young researchers are incentivized to work on things that will get them tenure quickly, they often choose subject areas due to practical career concerns or personal interest rather than likely fruitfulness or social importance, they often pursue research out of momentum rather than due to an EV calculation, etc. This is all quite understandable, but it does introduce inefficiencies.
Do you disagree?
No, these are all spot-on criticisms (but I don’t think they are specific to theoretical research). I certainly agree that many problems of mainstream academia can be solved via the older patronage model, or perhaps even the newer crowdsourcing model.
I guess it is not clear to me that the failures in the OP’s list are due to a structural fault, or due to more excusable issues, like:
(a) “the scholarship coordination problem” (stuff gets forgotten and rediscovered over and over again, people don’t read other disciplines, etc.)
(b) the standard exponential search for insight
Yeah, as I said to joaolkf, I think my title was misleading: I wasn’t trying to contrast theoretical research with empirical research, but merely to look at these inefficiencies in the context of theoretical research, since that’s what MIRI does.
And you’re probably right that for any given example of apparently unnecessary delay in progress, it can be pretty hard to tell which inefficiencies deserve the most blame.
Note that singling this out as a reason for inefficiency feels somewhat contradictory with the OP, where you suggested that there being no obvious profitable applications in the near-term was a reason for inefficiency. If people were choosing subjects areas based on “likely fruitfulness”, then we should expect areas with useful near-term applications to be prioritized.
I would not disagree. But then the claim seems trivial. Your comment’s second paragraph also applies to scientific research. Worse, in more applied areas graduate students have much less freedom to choose their own research topic and they seem to have a higher degree of overall social conformity.
We can then reinstate the question of what exactly about theoretical or philosophical research that makes it so particularly slow and unproductive. I say it is a huge, unknown search space, with no good search process and the fact that every time we find something, we lose it to another area. (Plus the fact that in the past our claims had political/practical consequences, as MIRI’s might have)
Oh I see what’s happening. Sorry, I think my title was accidentally misleading.
My post wasn’t trying to contrast the efficiency of theoretical research vs. empirical research. I just wanted to talk about those inefficiencies in the context of theoretical work specifically, since that’s what MIRI does. (E.g. I wanted to focus on examples from theoretical research.)
Anyway, the point about the large search space is an important one, and I hadn’t been thinking of the inefficiencies coming from political consequences until you mentioned it.
A book about Einstein and Godel claims both of them were able to identify a problem that became suddenly relevant and trackable due to other developments. I think there are certain ‘game changers’ that reshape discovery space producing low-hanging fruits. But, I do not think these low hanging fruits stay there for long. The possibility of AGI and X-Risks made some of your examples relevant, and they were addressed shortly after those game changers arose. But otherwise, some of your points seem similar to those on the Einstein-Godel book I read.
Which book?
http://www.ams.org/notices/200707/tx070700861p.pdf
It is more of a biography of their friendship. I don’t think is worth reading. I almost summarized all his conclusions of the matter, except he applies it in more detail to history of science.
Ok. But you did say relative inefficiency. Relative to what? And still, I think many of your low hanging fruits were retrospective. I’m not sure that they were really obviously important and easy to obtain before, say, 1995.
One easy fix would be to just could about inviting some young possibly relevant philosophers for dinner and saying “do you see these 2 equally fun abstract problems? this one is more relevant because impacts the future of humanity!”
Relative to financial markets, to which I was analogizing.
Ok, then the mistaken interpretation was my fault, you weren’t relevantly using the theoretical/applied dimension anywhere.
About decision theory. Perhaps utility maximizers were pulled towards game theory and thence economics and more narrow minded areas, while decision theory end up being maximized for oddness sometimes. That is, people who could attend to low hanging fruits were on areas where the background assumptions were unpopular,while people who could—perhaps—understand the background assumptions couldn’t care less for utility.
It would be strange if all the greatest minds of human history had indeed merely muddle in the swamp while ignoring all those beautiful low hanging fruits. Even great scientists or mathematicians often produce absurdities when delving into philosophy. Either this is a task completely useless and difficult, or useful and difficult. But theoretical discoveries don’t seem easy at all.
Optimistically, I believe there’s a massive hindsight bias. But if this is true, philosophy is indeed a sad craft, gems continuously slip from our hands, while he are left with nothing but mud.
On the other hand, I must say sometimes I feel we are slow and stubborn independently of the difficult nature of the huge search space. Perhaps my comment bellow is evidence of that. I remember one time young-Nick said he convinced a big time philosopher of some point, I said “Well, that’s it, you should erase everything on your CV and state just that. Forget your fancy PhD, convincing a philosopher triumphs all!”. Old-Nick once told sometimes he felt philosophers where only reinstating some long held “truth” they had since early on, and they would build their publications, careers and life over it. How can one be so pathological stubborn?
The weird part is that even though is difficult, hard to advance, its few breakthroughs not acknowledged, relatively low paying, it still is the most competitive academic career, by far. I say all of this, yet I’m pulling all nighters since January to mildly increase my odds of getting in a good philosophy PhD. Maybe only an anthropological/psychiatric study to find out what hell it’s wrong with it. I’m sorry if I’m inadvertently psychologising the question. I have been around our kind from birth and can’t help but relying on personal experience.
Can’t help also remembering the meme “It’s a dirty job, but somebody’s gotta do it”. Is it some kind of necessary evil? Or maybe it is just an ol’ boys’ club − 2500 years old -, and we gotta show we are tough, stubborn, ruthless and hard to join. Wonder what Robin would say of that.
Finally, although this is no longer true for English speaking contemporary philosophy, might be worth noticing philosophy has also been chiefly a highly influential political activity. Due to its political influence, sometimes you would not be finding the truth, as in science, you would be setting things to be true. That would explain the stubbornness. You are not convincing people of the truth, you are convincing people of what they ought to do, sometimes even through legislation. Today we’ve lost this powerful role, but it would seem we’ve maintained the stubbornness. But, wouldn’t MIRI “theoretical discoveries” have this moral flavour? You shouldn’t expect people to easily buy all your claims, for if they unrestrictedly do, they might be bound to abandon their lives, move to Bay Area, and research FAI. You might be in the fruit selling business instead of fruit finding. They can low hang all they want, you’ve got to convince people into buying them.
Which breakthrough did philosophy produce that aren’t acknowledged?
Scientific method, reason, utilitarianism, logic, subjective and objective probability. Although if asked some well-educated people would concede these might have come from philosophy, they often will still see philosophy as a failed, diseased, mislead and/or useless enterprise instead as one of the most fundamental and useful fields.
A common overlapping pattern is to agree with a subset of philosophy’s claims, say there’s nothing more to be discussed and that, hence, philosophy is useless.
I would love to see a justification of ‘reason’, myself. What work(s) would you point to as having made the breakthrough on reason?
taps out for now
There a lot of basic work in proability came from mathematicians like Bernoulli and Laplace.
The same goes for the “scientific method”. Most scientists just do whatever they feel make sense and that let’s them use their toys.
Can you point to breakthroughs by academic 20th/21th century philosophers?
taps out for now
Luke,
I think you are mistaken about the relative efficiency / inefficiency of scientific research. I believe that research is comparably efficient to much of industry, and that many of the things that look like inefficiencies are actually trading off small local gains for large global gains. I’ve come to this conclusion as the result of years of doing scientific research, where almost every promising idea I’ve come up with (including some that I thought quite clever) had already been explored by someone else. In fact, the typical case for when I was able to make progress was when solving the problem required a combination of tools, each of which individually was relatively rare in the field.
For instance, my paper on stochastic verification required: (i) familiarity of sum-of-squares programming; (ii) the application of supermartingale techniques from statistics; and (iii) the ability to produce relatively non-trivial convex relaxations of a difficult optimization problem. In robotics, most people are familiar with convex optimization, and at least some are familiar with sum-of-squares programming and supermartingales. In fact, at least one other person had already published a fairly similar paper but had not formulated the problem in a way that was useful for systems of practical levels of complexity, probably because they had (i) and (ii) but lacked (iii).
Of course, it is true that your point #3 (“new tools open up promising new angles of attack”) often does lead to low-hanging fruit. In machine learning, often when a new tool is discovered there will be a sizable contingent of the community who devotes resources to exploring possible uses of that tool. However, I think the time-scale for this is shorter than you might imagine. I would guess that, in machine learning at least, most of the ``easy″ applications of a new tool have been exhausted within five years of its introduction. This is not to say that new applications don’t trickle in, but they tend to either be esoteric or else require adapting the tool in some non-trivial way.
My impression is that MIRI believes that most of what drives what they perceive to be low-hanging fruit comes from #4 (“progress is only valuable to those with unusual views”). I think this is probably true, but not, as you probably believe, due to differing views about the intelligent explosion; I believe instead that MIRI’s differing views come from a misunderstanding of the context of their research.
For instance, MIRI repeatedly brings up Paul’s probabilistic metamathematics as an important piece of research progress produced by MIRI. I’ve mostly hedged when people ask me about this, because I do think it is an interesting result, but I feel by now that it has been oversold by MIRI. One could argue that the result is relevant to one or more of logic, philosophy, or AI, but I’ll focus on the last one because it is presumably what MIRI cares about most. The standard argument in favor of this line of research is that Lob’s theorem implies that purely logical systems will have difficulty reasoning about their future behavior, and that therefore it is worth asking whether a probabilistic language of thought can overcome this obstacle. But I believe this is due to a failure to think sufficiently concretely about the problem. To give two examples: first, humans seem perfectly capable of making predictions about their future behavior despite this issue, and I have yet to see an argument for why AIs should be different. Secondly, we manage to routinely prove facts about the behavior of programs (this is the field of program analysis) despite the fact that in theory this should be “undecidable”. This is because the undecidability issues don’t really crop up in practice. If MIRI wanted to make the case that they will crop up for a human-level AI in important ways, they should at least mention and respond to the large literature pointing in the opposite direction, and explain why those tools or their successors would be insufficient.
A full response to the misguidedness of the Lob obstacle as a problem of interest probably necessitates its own post, but hopefully this serves to at least partially illustrate my point. Another example would be decision theory of modal agents. I also won’t take the time to treat this in detail, but will simply note that this work studies a form of decision theory that MIRI itself invented, and that no one else uses or studies. It should therefore perhaps be unsurprising that it is relatively easy to prove novel results about it. It would be the equivalent of an activity I did to amuse myself in high school, which was to invent new mathematical objects and then study their properties. But I think it would be mistaken to point to this as an example of inefficiency in research. (And yes, I do think the idea of formalizing decision theory in terms of code is an interesting one. I just don’t think you get to point to this as an example of why research is inefficient.)
I’m not sure this is the best description of my objections to this post, but I need to start work now and it seems best to avoid keeping this around as a draft for too long, so I’m going to go ahead and post it and wait for everyone to tear it to shreds.
I agree that Luke here overstates the significance of my result, but I do think you miss the point a bit or are uncharitable. Regardless of whether making predictions about your own behavior is fundamentally difficult, we don’t yet understand any formal framework that can capture reasoning of the form “my decisions are good because my beliefs correspond to reality.” Assuming there is a natural formal framework capturing human reasoning (I think the record so far suggests optimism) then there is something interesting that we don’t yet understand. It seems like you are applying the argument: “We know that humans can do X, so why do you think that X is an important problem?” The comment about undecidability issues not applying in practice also seems a bit unfair; for programs that do proof search we know that we cannot prove claims of the desired type based on simple Godelian arguments, and almost all interesting frameworks for reasoning are harder to prove things about than a simple proof search. (Of course the game is that we don’t want to prove things about the algorithms in question, we are happy to form justified beliefs about them in whatever way we can, including inductive inference. But the point is that there are things we don’t understand.)
There are further questions about whether any work at MIRI is a meaningful contribution to this problem or any other. I think that the stuff I’ve worked on is plausibly but not obviously a significant contribution (basically the same status as the other work I’m doing).
Regarding the modal agents stuff, I agree that it’s a toy problem where you should expect progress to be fast (if there was a nice theorem at the end of it, then it wouldn’t be too unusual as a paper in theoretical CS, except for the unfashionable use of mathematical logic). Regarding updateless/timeless/ambient decision theory, it’s a clear step forward for a very idiosyncratic problem, but one for which I think you can make a reasonable case that it’s worthwhile.
I think you shouldn’t be too surprised to make meaningful headway on theoretically interesting questions, even those which will plausibly be important. It seems like in theoretical research today things are still developing reasonably rapidly, and the ratio between plausibly important problems and human capital is very large. I expect that given effort and success at recruiting human capital MIRI can make good headway, in the same sort of way that other theorists do. Optimistically they would be distinguished primarily by working on a class of problems which is unusually important given their values and model of the world (a judgment with which you might disagree).
And the question is: who cares? The mechanism by which human beings predict their future behavior is not logical inference. Similar ad-hoc Bayesian extrapolation techniques can be used in any general AI without worry about Löbian obstacles. So why is it such a pressing issue?
I don’t wish to take away from the magnitude of your accomplishment. It is an important achievement. But in the long run I don’t think it’s going to be a very useful result in the construction of superhuman AGIs, specifically. And it’s reasonable to ask why MIRI is assigning strategic importance to these issues.
I agree with this. Luke seems to be making a much stronger claim than the above, though.
I agree that that would be a bad argument. That was not the argument I intended to make, though I can see why it has been interpreted that way and I should have put more effort into explaining myself. I am rather saying that human reasoning looks so far away from even getting close to running into issues with Godel / Lob, that it seems like a rather abstruse starting point for investigation.
The rest of your comment seems most easily discussed in person, so I’ll do that and hopefully we’ll remember to update the thread with our conclusion.
What makes you say that? Did you see what I said about this here?
As someone with a reasonable acquaintance with program analysis, synthesis, and semantics… YIKES.
Rice’s Theorem is, so to speak, the biggest, nastiest rock we semantics folks have to crawl around on a regular basis. The way we generally do it is by constructing algorithms, semantic frameworks, and even entire programming languages in which undecidability cannot happen in the first place, thus restricting ourselves to analyzing something less than the set of all possible programs.
Now, admittedly, in practice we’ve made a lot of progress this way, because in practice there are really four kinds of programs: ones that provably terminate by design, ones that provably don’t terminate by design, provably undecidable programs (usually programs that rely on the halting behavior of some other program or logic for their own halting behavior), and just plain messed-up what-the-fuck programs.
The last kind are mostly created only by mistake. The third kind come up in program analysis and semantics, but we can usually construct a proof that we’re dealing with a formally undecidable problem there and set reasonable hard bounds on length of iteration or depth of recursion (or even find decidable subclasses of these problems that are decently useful to real people). The second kind is handled by writing coterminating programs over codata. The first kind is handled by writing terminating programs over data.
Undecidability issues do come up in practice, and the current research frontier (MIRI’s Lobian paper, Goedel Machines, AIXI) certainly indicates that these issues definitely come up in the study of Universal Artificial Intelligence. However, for most problems below the level of program analysis or universal induction, undecidability issues can be handled or contained productively by research effort.
Using heuristic methods rather than formal proofs. These already often fail (humans even fail to take the effect of being in a different emotional state properly into account), and that’s without having to deal with the effects of radical self-modifications which might impact your whole reasoning and motivational system.
I don’t think this is sufficient to dismiss my example. Whether or not we prove things, we certainly have some way of reasoning at least somewhat reliably about how we and others will behave. It seems important to ask why we expect AI to be fundamentally different; I don’t think that drawing a distinction between heuristics and logical proofs is sufficient to do so, since many of the logical obstacles carry over to the heuristic case, and to the extent they don’t this seems important and worth grappling with.
Also note that, even if you did think it was sufficient, I gave you another example that was based purely in the realm of formal logic.
Jacob, have you seen Luke’s interview with me, where I’ve tried to reply to some arguments of the sort you’ve given in this thread and elsewhere?
Perhaps here is a way to get a handle on where we disagree: Suppose we make a whole-brain emulation of Jacob Steinhardt, and you start modifying yourself in an attempt to achieve superintelligence while preserving your values, so that you can save the world. You try to go through billions of (mostly small) changes. In this process, you use careful but imperfect human (well, eventually transhuman) reasoning to figure out which changes are sufficiently safe to make. My expectation is that one of two things happens: Either you fail, ending up with very different values than you started with or stopping functioning completely; or you think very hard about how much confidence you need to have in each self-modification, and how much confidence you can achieve by ordinary human reasoning, and end up not doing a billion of these because you can’t achieve the necessary level of confidence. The only way I know for a human to reach the necessary level of confidence in the majority of the self-modifications would be to use formally verified proofs.
Presumably you disagree. If you could make a convincing case that a whole-brain emulation could safely go through many self-modifications using ordinary human reasoning, that would certainly change my position in the direction that the Löbian obstacle and other diagonalization issues won’t be that important in practice. If you can’t convince me that it’s probably possible and I can’t convince you that it probably isn’t, this might still help understanding where the disagreement is coming from.
I thought the example was pretty terrible. Everybody with more than passing familiarity with the halting problem, and more generally Rice’s theorem, understands that the result that you can’t decide for every program whether it’s in a given class doesn’t imply that there are no useful classes of programs for which you can do so. MIRI’s argument for the importance of Löb’s theorem is: There’s an obvious way you can try to get stable self-modification, which is to require that if the AI self-modifies, it has to prove that the successor will not destroy the world. But if the AI tries to argue “doing the following trivial self-modification is safe because the modified me will only do things that it proves won’t destroy the world, thus it won’t destroy the world”, that requires the AI to understand the soundness of its own proof system, which is impossible by Löb’s theorem. This seems to me like a straight-up application of what Löb’s theorem actually says, rather than the kind of half-informed misunderstanding that would suggest that program analysis is impossible because Rice’s theorem.
Glad to see you’re doing well, Benja :)
Here’s a concrete way you could try to get stable self-modification:
Suppose for concreteness that we have a C program, call it X, and that within the C program there is an array called “world_state” of length M and a floating point number called “utility”. A simple instantiation of X would look something like:
while(true){ action = chooseAction(worldState); world_state = propgateWorldState(worldState, action); utility = calculateUtility(worldState); }
We would like to consider modifications to X where we replace chooseAction with some new method chooseAction2 to get a program X2. Suppose we want to ensure some condition such as: from the current world state, if we use X2 instead of X, then after some finite period of time the sequence of utilities we get from using chooseAction2 will always be larger than the corresponding sequence if we have used chooseAction. Abusing notation a bit, this is the same as verifying the statement:
“there exists N such that for all n > N, utility2(n) > utility(n)”
[although note that utility2 and utility have fairly complicated descriptions if you actually try to write them out]. Now I agree that reasoning about this for arbitrary choices of chooseAction and chooseAction2 will be quite difficult (probably undecidable although I haven’t proved that). But the key point is that I get to choose chooseAction2, and there are many decision procedures that can prove such a statement in special cases. For instance, I could partition the space of world states into finitely many pieces, write down a transition function that over-approximates the possible transitions (for instance, by having a transition from Piece1 to Piece2 if any element of Piece1 can transition to any element of Piece2). Then I only need to reason about finite automata and those are trivially decidable.
You could argue that this proof system is fairly weak, but again, the AI gets to tailor its choices of chooseAction2 to be easy to reason about. You could also argue that an AI that could only create instances of chooseAction2 that could be reasoned about as above would be severely hampered, but I think this overlooks the fact that such techniques have been extraordinarily successful at reasoning about fairly complex systems.
In addition, there are more general proof strategies than the above if that one does not satisfy you. For instance, I could just require that any proposed modification to chooseAction2 come paired with a proof that that modification will be safe. Now I agree that there exist choices of chooseAction2 that are safe but not provably safe and this strategy disallows all such modifications. But that doesn’t seem so restrictive to me.
Finally, I agree that such a naieve proof strategy as “doing the following trivial self-modification is safe because the modified me will only do things that it proves won’t destroy the world, thus it won’t destroy the world” does not work. I’m not proposing that. The proof system clearly has to do some actual work. But your argument seems to me to be saying “this extremely naieve strategy doesn’t work, let’s develop an entire subfield of logic to try to make it work” whereas I advocate “this extremely naieve strategy doesn’t work, let’s use a non-naieve approach instead, and look at the relevant literature on how to solve similar problems”.
I also agree that the idea of “logical uncertainty” is very interesting. I spend much of my time as a grad student working on problems that could be construed as versions of logical uncertainty. But it seems like a mistake to me to tackle such problems without at least one, and preferably both, of: an understanding of the surrounding literature; experience with concrete instantiations of the problems at hand.
Sorry for being curmudgeonly there—I did afterwards wish that I had tempered that. The thing is that when you write something like
that sounds to me like you’re painting MIRI as working on these topics just because it’s fun, and supporting its work by arguments that are obviously naive to someone who knows the field, and that you’re supporting this by arguments that miss the point of what MIRI is trying to say. That’s why I found the example of program analysis so annoying—people who think that the halting problem means that program analysis is impossible really are misinformed (actually Rice’s theorem, really, but someone with this misconception wouldn’t be aware of that), both about the state of the field and about why these theorems say what they say. E.g., yes, of course your condition is undecidable as long as there is any choice f(s) of chooseAction2(s) that satisfies it; proof: let chooseAction2(s) be the program that checks whether chooseAction2(s) satisfies your criterion, and if yes return chooseAction(s), otherwise f(s). That’s how these proofs always go, and of course that doesn’t mean that there are no programs that are able to verify the condition for an interesting subclass of chooseAction2′s; the obvious interesting example is searching for a proof of the condition in ZFC, and the obvious boring example is that there is a giant look-up table which decides the condition for all choices of chooseAction2(s) of length less than L.
One possibility is that MIRI’s arguments actually do look that terrible to you, but that this is because MIRI hasn’t managed to make them clearly enough. I’m thinking this may be the case because you write:
First, that’s precisely the “obvious” strategy that’s the starting point for MIRI’s work.
Second, yes, Eliezer’s arguing that this isn’t good enough, but the reason isn’t that it there are some safe modifications which aren’t provably safe. The work around the Löbian obstacle has nothing to do with trying to work around undecidability. (I will admit that for a short period at the April workshop I thought this might be a good research direction, because I had my intuitions shaken by the existence of Paul’s system and got overly optimistic, but Paul quickly convinced me that this was an unfounded hope, and in any case the main work around the Löbian obstacle was never really related to this.) MIRI’s argument definitely isn’t that “the above algorithm can’t decide for all chooseAction2 whether they’re safe, therefore it probably can’t decide it for the kind of chooseAction2 we’re interested in, therefore it’s unacceptable”. If that’s how you’ve understood the argument, then I see why you would think that the program analysis example is relevant. (The argument is indeed that it seems to be unable to decide safety for the chooseAction2 we’re interested in, but not because it’s unable to decide this for any generic chooseAction2.)
Third, you seem to imply that your proposal will only take safe actions. You haven’t given an argument for why we should think so, but the implication seems clear: You’re using a chooseAction that is obviously safe as long as it doesn’t rewrite itself, and it will only accept a proposed modification if it comes with a proof that it is safe, so if it does choose to rewrite itself then its successor will in fact be safe as well. Now I think this is fine reasoning, but you don’t seem to agree:
You seem to be quite willing to use that reasoning yourself to show that the initial AI is safe, but you don’t think the AI should be able to use the same sort of reasoning. Eliezer’s argument is that this is in fact reasoning you want to use when building a self-improving AI: Yes, you can reason in more detail about how the AI you are building works, but this AI_1 will build an AI_2 and so on, and when proving that it’s ok to build AI_1 you don’t want to reason in detail about how AI_1,000,000 is going to work (which is built using design principles you don’t understand, by AI_999,999 which is much smarter than you); rather, you want to use general principles to reason that the because of the way AI_1,000,000 came to be, it has to be safe (because AI_999,999 only builds safe AIs, because it was built by AI_999,998 which only builds safe AIs...). But not only you need to reason like that, because you don’t know and aren’t smart enough to comprehend AI_1,000,000′s exact design; AI_1, which also isn’t that smart, will need to be able to use the same sort of reasoning. Hence, the interest in the Löbian obstacle.
There are caveats to add to this and parts of your comment I haven’t replied to, but I’m running into the same problem as you with your original comment in this thread, having already spent too much time on this. I’d be happy to elaborate if useful. For my part, I’d be interested in your reply to the other part of my comment: do you think I have localized our disagreement correctly?
Oh, one last point that I shouldn’t skip over: I assume the point about MIRI lacking “an understanding of the surrounding literature” refers to the thing about being tripped up at the July workshop by not knowing Gaifman’s work on logical uncertainty well enough. If so, I agree that that was an avoidable fail, but I don’t think it’s indicative of always ignoring the relevant literature or something like that. I’ll also admit that I still haven’t myself more with Gaifman’s work, but that’s because I’m not currently focusing on logical uncertainty, and I intend to do so in the future.
Don’t worry, I wasn’t offended :)
I don’t think that MIRI is working on these topics just because they are fun, and I apologize for implying that. I should note here that I respect the work that you and Paul have done, and as I said at the beginning I was somewhat hesitant to start this discussion at all, because I was worried that it would have a negative impact on either you / Paul’s reputation (regardless of whether my criticisms ended up being justified) or on our relationship. But in the end I decided that it was better to raise my objections in fairly raw form and deal with any damage later.
What I would say is that the arguments start to look really fishy when one thinks about concrete instantiations of the problem.
I’m not sure I understand what you’re saying here, but I’m not convinced that this is the sort of reasoning I’d use. It seems like Paul’s argument is similar to yours, though, and I’m going to talk to him in person in a few days, so perhaps the most efficient thing will be for me to talk to him and then report back.
I don’t think that “whole brain emulations can safely self-modify” is a good description of our disagreements. I think that this comment (the one you just made) does a better job of it. But I should also add that my real objection is something more like: “The argument in favor of studying Lob’s theorem is very abstract and it is fairly unintuitive that human reasoning should run into that obstacle. Standard epistemic hygiene calls for trying to produce concrete examples to motivate this line of work. I have not seen this done by MIRI, and all of the examples I can think of, both from having done AI and verification work myself, and from looking at what my colleagues do in program analysis, points in the squarely opposite direction.”
When I say “failure to understand the surrounding literature”, I am referring more to a common MIRI failure mode of failing to sanity-check their ideas / theories with concrete examples / evidence. I doubt that this comment is the best place to go into that, but perhaps I will make a top-level post about this in the near future.
Sorry for ducking most of the technical points, as I said, I hope that talking to Paul will resolve most of them.
Good to hear, and thanks for the reassurance :-) And yeah, I do too well know the problem of having too little time to write something polished, and I do certainly prefer having the discussion in fairly raw form to not having it at all.
I’m not really sure what you mean by a “concrete instantiation”. I can think of concrete toy models, of AIs using logical reasoning which know an exact description of their environment as a logical formula, which can’t reason in the way I believe is what we want to achieve, because of the Löbian obstacle. I can’t write down a self-rewriting AGI living in the real world that runs into the Löbian obstacle, but that’s because I can’t write down any AGI that lives in the real world.
My reason for thinking that the Löbian obstacle may be relevant is that, as mentioned in the interview, I think that a real-world seed FAI will probably use (something very much like) formal proofs to achieve the high level of confidence it needs in most of its self-modifications. I feel that formally specified toy models + this informal picture of a real-world FAI are as close to thinking about concrete instantiations as I can get at this point.
I may be wrong about this, but it seems to me that when you think about concrete instantiations, you look towards solutions that reason about the precise behavior of the program they’re trying to verify—reasoning like “this variable gets decremented in each iteration of this loop, and when it reaches zero we exit the loop, so we won’t loop forever”. But heuristically, while it seems possible to reason about the program you’re creating in this way, our task is to ensure that we’re creating a program which creates a program which creates a program which goes out to learn about the world and look for the most efficient way to use transistors it finds in the external environment to achieve its goals, and we want to verify that those transistors won’t decide to blow up the world; it seems clear to me that this is going to require reasoning of the type “the program I’m creating is going to reason correctly about the program it is creating”, which is the kind of reasoning that runs into the Löbian obstacle, rather than the kind of reasoning applied by today’s automated verification techniques.
Writing this, I’m not too confident that this will be helpful to getting the idea across. Hope the face-to-face with Paul with help, perhaps also with translating your intuitions to a language that better matches the way I think about things.
I think that the point above would be really helpful to clarify, though. This seems to be a recurring theme in my reactions to your comments on MIRI’s arguments—e.g. there was that LW conversation you had with Eliezer where you pointed out that it’s possible to verify properties probabilistically in more interesting ways than running a lot of independent trials, and I go, yeah, but how is that going to help with verifying whether the far-future descendant of an AI we build now, when it has entire solar systems of computronium to run on, is going to avoid running simulations which by accident contain suffering sentient beings? It seems that to achieve confidence that this far-future descendant will behave in a sensible way, without unduly restricting the details of how it is going to work, is going to need fairly abstract reasoning, and the sort of tools you point to don’t seem to be capable of this or to extend in some obvious way to dealing with this.
I’m fairly sure that the reason your brain goes “it would be safe if we only allow self-modifications when there’s a proof that they’re safe” is that you believe that if there’s a proof that a self-modification is safe, then it is safe—I think this is probably a communication problem between us rather than you actually wanting to use different reasoning. But again, hopefully the face-to-face with Paul can help with that.
Thanks for the reply! Thing is, I don’t think that ordinary human reasoning should run into that obstacle, and the “ordinary” is just to exclude humans reasoning by writing out formal proofs in a fixed proof system and having these proofs checked by a computer. But I don’t think that ordinary human reasoning can achieve the level of confidence an FAI needs to achieve in its self-rewrites, and the only way I currently know how an FAI could plausibly reach that confidence is through logical reasoning. I thought that “whole brain emulations can safely self-modify” might describe our disagreement because that would explain why you think that human reasoning not being subject to Löb’s theorem would be relevant.
My next best guess is that you think that even though human reasoning can’t safely self-modify, its existence suggests that it’s likely that there is some form of reasoning which is more like human reasoning than logical reasoning and therefore not subject to Löb’s theorem, but which is sufficiently safe for a self-modifying FAI. Request for reply: Would that be right?
I can imagine that that might be the case, but I don’t think it’s terribly likely. I can more easily imagine that there would be something completely different from both human reasoning or logical reasoning, or something quite similar to normal logical reasoning but not subject to Löb’s theorem. But if so, how will we find it? Unless essentially every kind of reasoning except human reasoning can easily be made safe, it doesn’t seem likely that AGI research will hit on a safe solution automatically. MIRI’s current research seems to me like a relatively promising way of trying to search for a solution that’s close to logical reasoning.
Ok, I think I probably don’t understand this yet, and making a post about it sounds like a good plan!
No problem, and hope so as well.
I don’t have time to reply to all of this right now, but since you explicitly requested a reply to:
The answer is yes, I think this is essentially right although I would probably want to add some hedges to my version of the statement (and of course the usual hedge that our intuitions probably conflict at multiple points but that this is probably the major one and I’m happy to focus in on it).
Thanks!
Thought experiment:
We have an AI which controls a robotic arm inside a box. The box also contains various cooking tools and ingredients to make cake, and a big red button that kills a puppy.
We prefer to cake to no cake, and we prefer our cake to be delicious and moist cake, but above all we prefer the companion cu-… puppy to stay alive.
Therefore, we implement in the AI a “puppy safety module” (PSM) which vetoes any course of action proposed by the planning module if it determines that there is any non-negligible probability of the red button being depressed.
When the PSM can’t make an accurate prediction it will always err on the safe side: things like throwing tools on the box walls where they can bounce unpredictably and hit the red button don’t get approved.
Athena, the Greek goddess of wisdom, justice and math, came down from Mt. Olympus and debugged the code for PSM. She was also so kind to check all the compilers, OS and hardware for bugs and glitches.
Do you agree that with this AI the puppy is most likely safe?
Now, let’s assume that the cakes made by this AI aren’t very good. We don’t really know how to program it to make better cakes, we try our best reinforcement learning algorithms but it doesn’t improve much. Then somebody suggest we try to allow the AI to modify it own code:
In addition to sending motor commands to the arm, the planning module can also edit a copy of the AI code and eventually replace the whole program with this successor. Of course, these actions must still be approved by the PSM, but they are otherwise unrestricted: in principle they can modify the PSM, or add a bypass, or get rid of it entirely.
Do you think that this AI is unsafe for the puppy?
Do you think it is safe but unable to improve itself in effective ways?
Since the PSM was designed without self-modification in mind, “safe but unable to improve itself in effective ways”.
(Not sure how this thought experiment helps the discussion along.)
Can you please motivate?
Suppose that in the recesses of the code there is an instantiation of the bubble sort algorithm. The planner proposes to change it with, say, merge sort. Do you think that the PSM would generally disapprove such change?
Do you think it would approve it, but it would still be unable to approve modifications that would be needed for significant improvement?
MIRI claim to have had an important insight on AI design (this so called “Löbian obstacle”) that experts in relevant fields (AI, model checking, automated theorem proving, etc.) didn’t have. MIRI attempted to communicate their insight, but so far the experts have mostly ignored MIRI claims or denied that they are likely to be important and relevant.
What is the most likely explanation for that? It seems that we can narrow it to two hypotheses:
A) MIRI’s insight is really relevant and important to AI design, but communication with the experts failed because of some problem on MIRI’s side, or on the experts’ side (e.g. stubbornness, stupidity) or both (e.g. excessively different backgrounds).
B) MIRI is mistaken about the value of their insight (possible psychological causes may include confirmation bias, Dunning–Kruger effect, groupthink, overconfident personalities, etc.).
I would say that, barring evidence to the contrary, hypothesis B is the most likely explanation.
I don’t believe these options are exhaustive. “Relevant and valuable” are subjective and time-varying. The published work might be interesting and useful down the line, but not help the problems that AI researchers are working on right now.
It usually takes a few years for the research community to assimilate strange new ideas—sometimes much more than a few years. This isn’t due to a failure on anybody’s part—it’s due to the fact that scientists pick problems where they have a reasonable prospect of success within a few years.
I would give MIRI at least a decade or two before evaluating whether their work had any mainstream traction.
MIRI stated goals are similar to those of mainstream AI research, and MIRI approach in particular includes as subgoals the goals of research fields such as model checking and automated theorem proving.
Do you claim that MIRI is one or two decades ahead of mainstream researchers?
If the answer is no, then how does MIRI (or MIRI donors) evaluate now whether these lines of work are valuable towards advancing their stated goals?
Research has both ultimate goals (“machines that think”) and short-term goals (“machines that can parse spoken English”). My impression is that the MIRI agenda is relevant to the ultimate goal of AI research, but has only limited overlap with the things people are really working on in the short term. I haven’t seen MIRI work that looked directly relevant to existing work on theorem proving or model checking. (I don’t know much about automated theorem proving, but do know a bit about model checking.)
It’s not a matter of “ahead”. Any research area is typically a bunch of separate tracks that proceed separately and eventually merge together or have interconnections. It might be several decades before the MIRI/self modifying AI track merges with the main line of AI or CS research. That’s not necessarily a sign anything is wrong. It took decades of improvement before formal verification or theorem proving become part of the computer science toolkit. I would consider MIRI a success if it follows a similar trajectory.
I can’t imagine any really credible assurance that “this basic research is definitely useful,” for any basic research. The ultimate goal “safe self modifying AI” is too remote to have any idea if we’re on the right track. But if MIRI, motivated by that goal, does neat stuff, I think it’s a safe bet that (A) the people doing the work are clueful, and (B) their work was at least potentially useful in dealing with AI risks. And potentially useful is the best assurance anybody can ever give.
I’m a computer systems guy, not a theorist or AI researcher, but my opinion of MIRI has gradually shifted from “slightly crankish” to “there are some interesting questions here and MIRI might be doing useful work on them that nobody else is currently doing.” My impression is a number of mainstream computer scientists have similar views.
Eliezer recently gave a talk at MIT. If the audience threw food at the stage, I would consider that evidence for MIRI being crankish. If knowledgeable CS types showed up and were receptive or interested, I would consider that a strong vote of confidence. Anybody able to comment?
It’s definitely not a goal of mainstream AI, and not even a goal of most AGI researchers, to create self-modifying AI that provably preserves its goals. MIRI’s work on this topic doesn’t seem relevant to what mainstream AI researchers want to achieve.
Zooming out from MIRI’s technical work to MIRI’s general mission, it’s certainly true that MIRI’s failure to convince the AI world of the importance of preventing unFriendly AI is Bayesian evidence against MIRI’s perspective being correct. Personally, I don’t find this evidence strong enough to make me think that preventing unFriendly AI isn’t worth working on.
Also, two more points why MIRI isn’t that likely to produce research AI researchers will see as a direct boon to their field: One, stuff that’s close to something people are already trying to do is more often already worked on; the stuff that people aren’t working on seem more important for MIRI to work on. And two, AGI researchers in particular are particularly interested in results that get us closer to AGI, and MIRI is trying to work on topics that can be published about without bringing the world closer to AGI.
I wouldn’t say MIRI has tried very hard yet to communicate about the Lobian obstacle to people in the relevant fields. E.g. we haven’t published about the Lobian obstacle in a journal or conference proceedings.
Part of the reason for that is that we don’t expect experts in these fields to be very interested in it. The Lobian obstacle is aiming at better understanding of strongly self-modifying systems, which won’t exist for at least 15 years, and probably longer than that.
I agree the AI community won’t be very interested. But it might be worth sending it to some theoretical computer science venue—STOC, say—instead. If nothing else, it would give useful information about how receptive academics are to the topic.
I look forward to a clear, detailed explanation of MIRI’s thinking on this subject. In particular this counter-intuitive result:
deserves some technical elaboration.
Mark, have you read Eliezer’s article about the Löbian obstacle, and what was your reaction to it?
I’m in the early stages of writing up my own work on the Löbian obstacle for publication, which will need to include its own (more condensed, rather than expanded) exposition of the Löbian obstacle; but I liked Eliezer’s article, so it would be helpful to know why you didn’t think it argued the point well enough.
I have, although formal logic is not my field so please excuse me if I have misunderstood it.
Eliezer does not demonstrate that overcoming the Löbian obstacle is necessary in the construction of tiling agents, he rather assumes it. No form of program verification is actually required, if you do not use the structure of a logical agent. Consider, for example, the GOLUM architecture[1] which is a form of tiling agent that proceeds by direct experimentation (simulation). It does not require an ability to prove logical facts about the soundness and behavior of its offspring, just an ability to run them in simulation. Of course logical program analysis helps in focusing in on the situations which give rise to differing behavior between the two programs, but there are no Gödelian difficulties there (even if there were you could fall back on probabilistic sampling of environments, searching for setups which trigger different results).
The MIRI argument, as I understand it is: “a program which tried to predict the result of modifying itself runs into a Löbian obstacle; we need to overcome the Löbian obstacle to create self-modifying programs with steadfast goal systems.” (I hope I am not constructing a strawman in simplifying it as such.) The problem comes from the implicit assumption that the self-modifying agent will use methods of formal logic to reason about the future actions of its modified self. This need not be the case! There are other methods which work well in practice, converge on stable solutions under the right circumstances, and have been well explored in theory and in practice.
I’m reminded of the apocryphal story of two space-age engineers that meet after the fall of the Soviet Union. The American, who oversaw a $1.5 million programme to develop the “Astronaut Pen” which would write in hard vacuum and microgravity environments, was curious to know how his Russian counterpart solved the same problem. “Simple,” he replied, “we used a pencil.”
You could expend significant time, energy, and donations to solve the problem of Löbian obstacles in the context of tiling agents for self-modifying AI. Or you could use an existing off-the-shelf solution that solves the problem in a different way.
[1] http://goertzel.org/GOLEM.pdf
Doesn’t the non-apocryphal version of that story have some relevance?
http://en.wikipedia.org/wiki/Space_Pen
http://www.snopes.com/business/genius/spacepen.asp
Using a space pencil could cause your spaceship to light on fire. Sometimes it pays to be careful.
I don’t think the argument is that AI would be fundamentally different, but rather that “we can reason at least somewhat reliably when making predictions of agents who don’t drastically self-modify, and of whom we have thousands of years of data to help build our predictions on” isn’t good enough to deal with the case of a drastically self-modifying agent that could exhibit entirely novel behavior and cognitive dynamics even if it wasn’t capable of self-modifying. “Somewhat reliably” is fine only as long as a single failure isn’t enough to throw all the rest of your predictions to the trash bin.
I don’t know enough about your second example to feel confident commenting on it.
Humans seem pretty good at making correct predictions even if they have made incorrect predictions in the past. More generally, any agent for whom a single wrong prediction throws everything into disarray will probably not continue to function for very long.
Fair enough. This is an admirable habit that is all too rare, so have an upvote :).
That’s basically my point. A human has to predict the answer to questions of the type “what would I do in situation X”, and their overall behavior is the sum of their actions over all situations, so they can still get the overall result roughly correct as long as they are correct on average. An AI that’s capable of self-modification also has to predict the answer to questions of the type “how would my behavior be affected if I modified my decision-making algorithm in this way”, where the answer doesn’t just influence the behavior in one situation but all the ones that follow. The effects of individual decisions become global rather than local. It needs to be able to make much more reliable predictions if it wants to have a chance of even remaining basically operational over the long term.
Thanks. :)
And more important, its creators want to be sure that it will be very reliable before they switch it on.
These examples involve predictions generated by processes which are not purely logical systems, and which we don’t understand enough to code into an AI. So it seems like Paul’s idea could be progress towards having a process that makes such predictions about itself that we can understand at that level.
Can you clarify in what sense you think a computer program is not a purely logical system? Or am I misunderstanding you?
ETA: In particular, I’m referring to the point where I said:
I didn’t meant that the computer program is not a purely logical system, but that the people proving facts about its behavior aren’t purely logical systems.
Program analysis consists of writing computer programs that reason about other computer programs. Is the objection that these programs were written by a human? That seems like a strange objection to me if so.
Ok, then I did not understand exactly what you meant, but I still don’t think this is a counterexample to the problem Paul’s idea tries to get around.
The problem is that logical systems have problems reasoning about their own behavior, not a claim that there is no other logical system that can reason about them. In particular, we are interested in if an optimization process can model itself as an optimization process, accurately predicting that its future decisions are likely to achieve outcomes that score well on its optimization criteria and the score will be better if it has more resources, and will become much worse if its representation of its criteria gets corrupted, (all using abstract reasoning in much less time than fully simulating its future decisions). Can program analysis do that?
ETA: Also, I should note that this is a good question and I’m glad you asked it!
If your question is whether a program analyzer can, when given itself as input, produce sensible results, the answer is yes. Program analyzers are meant to run on arbitrary code, so in particular they can be run on themselves as a special instance. (Actually, nothing particularly special happens in this case as far as I can tell.)
Now, a key point is the formalism we are working in: a program analyzer takes in a program P and some specification S, and checks whether P obeys specification S (for instance, checking that it never accesses memory before it allocates it). Importantly, P is allowed to do any 3 of the following:
report that P satisfies S (with a proof)
report that P does not satisfy S (with a counterexample)
give up (i.e. report “unsure”)
So a key question is how often it gives up! But while there are many instances where modern program analysis tools do give up, there are also many where they don’t. Furthermore if you are an AI and you propose a modification to your code, and your program analysis subroutine reports “unsure”, you are free (and would be wise) to try a different modification instead. Researchers in the field of program analysis are extremely cognizant of the halting problem (which is closely related to Lob’s theorem) and typically deal with it by over-approximating, e.g. by identifying conditions that would be sufficient for halting, although not necessarily necessary. As a result one obtains solubility at the cost of precision (although note that the approximation is always sound: if we report that P satisfies S, then P does indeed always satisfy S).
This is a good approach for dealing with the halting problem, but I think that Lob’s theorem is not so closely related that getting around the halting problem means you get around Lob’s theorem.
The theoretical AI that would run into Lob’s theorem would need more general proof producing capability than these relatively simple program analyzers.
It seems like these program analyzers are built around the specification S they check for, with the human programmer doing the work of constructing a structure of a proof which can be filled in to a complete proof by supplying basic facts that the program can check. For example, I have a library that produces .Net assemblies, with byte code that targets a stack based virtual machine, and I want to verify that instructions to read elements off the stack get elements of the type is expecting. So I wrote my library to keep track of types that could be written to the stack at any execution point (represented by a stack of types). It is straightforward to compute the possible stack states after each instruction, given that instruction and the previous possible stack trace, and determine if the instruction is legal given the previous state (well, branching makes it more complicated, but not too much). So, in this case, I came up with the structure of tracking possible states after each instruction, and then it was straightforward to write my program to fill in that structure, but I did not, and don’t know how to, write my program to come up with the proof structure. It is therefor easy to be confident that proof will have nice properties, because the space of possible proofs with this structure is much smaller than the space of all possible proofs.
The theoretical AI that would run into Lob’s theorem would be able to come up with proof structures, as an AI that could only use proof structures prepackaged by human programmers would have huge gaps in its capabilities. This may introduce difficulties not seen in simple program analyzers.
In the example I asked about earlier, the optimizer that needs to prove things about its own future decisions, it runs into an issue that Lob’s theorem applies to. In order prove that its own future decisions are good, it would rely on the fact that it’s future decision are based on its own sound proof system, so it needs to assert that its own proof system is sound, that if its proof system says “X”, then X. Lob’s theorem says that, for all statements X, if a system P (at least as powerful as Peano arithmetic) says “P says ‘X’ implies X” then P says X. So, if the system asserts its own soundness, it asserts anything.
So, in summary:
Lob’s theorem is a problem for generally powerful formal logic based proof systems that assert their own soundness.
Program analyzers are formal logic based proof systems that do not run into Lob’s theorem, because they are not generally powerful, and do not assert their own soundness.
Humans are generally powerful proof generators than can have confidence in their own soundness, but are not purely based on formal logic, (we might intuitively dismiss a formal logical proof as “spurious”) and so can get around Lob’s theorem, but we don’t understand how humans do this well enough to write a program to do it.
Paul’s idea of a probabilistic proof system could lead to a generally powerful proof generator with confidence in its own soundness that is not based on formal logic, so that it gets around Lob’s theorem, that we can understand well enough to write a program for.
I’m not sure I’ve understood what you have in mind here, but in the general case complete type checking in .NET (that is, proving that an assembly not only is syntactically well-formed but also never throws type-related exceptions at runtime) is undecidable because of Rice’s theorem.
In the general case, complete type checking is as difficult as proving arbitrary claims in first-order logic.
I am not trying to write a classifier that tells you whether or not an arbitrary program throws a type exception. I wrote a verifier that tells you whether or not an arbitrary program can be proven not to throw type exceptions (except possibly at an explicit cast statement, or a throw exception statement) with a particular proof strategy that covers a huge space of useful, nicely structured programs.
See also jsteinhardt’s comment I was responding to, which discussed getting around the halting problem by allowing the checker to say “I don’t know”.
I’m not an expert on .NET, but is there anything that can throw a type exception other than an explicit cast or an explicit throw (or the standard library, I suppose)?
There are sequences of .Net instructions that result in the runtime throwing type exceptions, because it tries to read a value of a certain type of the stack, and it gets an incompatible value. This is the situation that my verifier guards against.
The standard .Net runtime also includes a verifier that checks the same thing, and it will not run code that fails this validation unless it is explicitly trusted. So a verifiable .Net assembly will not throw type exceptions without an explicit cast or throw, but an arbitrary assembly may do so. The compilers for the standard languages such as C# will generally only produce verifiable assemblies unless you explicitly mark parts of the source code as “unsafe”, and unsafe, or unverifiable assemblies need special permissions to run at all.
(There are other properties that both my verifier and the standard verifier check for. The reason I wrote my own is that it produces much more informative descriptions of problems it finds, and it is integrated into my assembly emitting libraries, so it detects problems as the assembly to be emitted is defined, and when run in the debugger, will easily show the compiler code and execution state that caused the problem.)
Thanks for the clarification.
IIUC, unverifiable code does not, or at least is not guaranteed to, politely throw an exception should a type error occur. It may crash the runtime or fail silently leaving the application in an incorrect state.
Ok. I thought that you were considering assemblies that passed the standard .NET verification and you were trying to check for some stronger property (such as absence of runtime exceptions caused by downcasts). That would have been equivalent to arbitrary first-order logic inference.
Since you are instead checking for decidable properties, your system is indeed not equivalent to arbitrary first-order logic inference.
But as jsteinhardt says, it is actually possible to write verifiers that attempt to check for undecidable properties, provided that they have the option to give up.
My mathematical logic is a bit rusty, but my impression is that the following are true:
Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem. A reflectively consistent theory may still be incomplete, but any complete theory is necessarily reflectively consistent.
The undecidability of the halting problem is basically Godel’s theorem stated in computational terms. If we could identify a subset L of Turing machines for whom the halting problem can be decided, as long as it was closed under operations such as inserting a (non-self-referential) sub-routine, then we would be able to verify any (non-self-referential) property of the program that was also expressible in L. This is a sketch of a claim rather than an actual claim that I’ve proved, though.
Finally, I think it’s worth pointing out an actual example of a program analysis tool since I think they are more powerful than you have in mind. The following slides are a good example of such a tool. At a high level, it gets around the problems you are worried about by constructing an over-approximation of the halting problem that is expressible in propositional logic (and thus decidable, in fact it is even in NP). More generally we can construct a sequence of approximations, each expressible in propositional logic, whose conjunction is no longer an approximation but in fact exactly the original statement.
Why do you say that? My understanding is that Godel’s theorem says that a (sufficiently powerful) logical system has true statements it can’t prove, but these statements are excessively complicated and probably not important. Is there some way you envision an AGI being limited in its capacity to achieve its goals by Godel’s theorem, as we envision Lob’s theorem blocking an AGI from trusting its future self to make effective decisions? (Besides where the goals are tailored to be blocked by the theorem: “Prove all true statements in my formal system”)
As near as I can tell, this is more powerful than other static analysis tools that I have seen (though maybe not, the Google Web Toolkit optimizer is pretty impressive), but it is well within what I expect to be possible, and doesn’t get around Lob’s theorem. (I can’t be too confident in this assessment of its power because I don’t see a clear claim of what sort of statements it can prove or how it works except that it seems to detect when inputs to a statement may have invalid values and that it uses brute force techniques to analyze functions and then associates a summary of the analysis with that function (constraints on valid inputs and guarantees on outputs?) so that its analysis of call sites can use the summary.) (This sort of program analyzer is interesting in its own right, and I would like to see a more complete explanation of it, but I don’t think it’s existence says anything about the problems posed by Lob’s theorem.)
Do you agree or disagree that complete implies reflectively consistent? If you agree, then do you agree or disagree that this means avoidance of Godelian obstacles implies avoidance of Lobian obstacles? If you agree with both of those statements, I’m confused as to why:
is a controversial statement.
Ah, so by “Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem” you mean if you overcome Godelian obstacles you also overcome Lobian obstacles? I think I agree, but I am not sure that it is relevant, because the program analyzer examples don’t overcome Godelian obstacles, they just cope with the Godelian obstacles, which does not similarly imply coping with or overcoming Lobian obstacles.
I’m guessing that he is using the standard AI terminology of a “logical program” as one which reasons by means of logical inference over a knowledgebase. This is not how human minds work, nor do many AI researchers view it as a viable path forward.
The types of programs which have seen success in the AI revolution that is currently going on, often are not amenable to program analysis. They very often fall in the “undecidable” category, at least with respect to their anticipated behavior.
This depends which properties you care about. I suspect there isn’t a small model of the AI program that duplicates its output behavior, but you can do a lot with off-the-shelf analysis.
It’s relatively easy to prove things like “this program will only ever use the following subset of available system calls,” or “this program is well typed”, or “the program cannot modify itself”, or “can only modify these fixed specific parts of its state.”
I can also imagine a useful AI program where you can prove a bound on the run-time, of the form “this program will always terminate in at most X steps”, for some actual fixed constant X. (Obviously you cannot do this for programs in general, but if the program only loops over its inputs for a fixed number of iterations or somesuch, you can do it.)
These sorts of properties are far from a general proof “this program is Safe”, but they are non-trivial and useful properties to verify.
The FAI-related things you would want to prove are of the form: “when given a world state characterized by X, input percepts characterized by Y, the program always generates outputs characterized by Z.” For many existing and popular AI architectures, we haven’t the foggiest idea how to do this. (It’s no surprise that Eliezer Yudkowsky favors Pearl’s causal probabilistic graph models, where such analysis is at least known to be possible.)
To the extent that X, Y, and Z can be written formally within the programming language, program analysis at least in principle is fully capable of proving such a statement. I apologize if this comes off as rude, but your statement that “we haven’t the foggiest idea how to do this” is flat-out false. While there are certainly unique challenges to reasoning about the sort of code that gets written in machine learning, it seems to me that the main reason we don’t have well-developed analysis tools is that most code doesn’t look like machine learning code, and so there has been little pressure to develop such tools.
Program analysis consists of writing computer programs that reason about other computer programs. Is the objection that these programs were written by a human? That seems like a strange objection to me if so.
Edit: oops, replied to wrong comment.
Can you give an example? The main reason they are not amenable to program analysis is because the sorts of guarantees they are supposed to satisfy are probabilistic / statistical in nature, and we don’t yet have good techniques for verifying such properties. I am pretty sure that the issue is not undecidability.
I should also clarify that whether or not JGWeissman thought that I meant “logical program” instead of “computer program”, my comment was intended to mean “computer program” and the field of program analysis studies “computer programs”, not “logical programs”.
Ok, throw it back at you: how do you prove the behavior of a deep belief network, or any other type of neural network currently in vogue, short of actually running it? If you do have some way of doing it, can I be coauthor? I didn’t mean to imply that it was proved impossible, just that no one has the faintest idea how to do it—and not for lack of trying.
What does “prove the behavior” mean? If you mean “prove that it will classify this next image correctly” then of course the answer is no, although that is because it’s not even necessarily true. If you mean “prove that the program’s behavior always falls within some well-defined regime”, then yes we can absolutely prove that sort of statement. Things that are at the boundary of what we can prove (i.e. I don’t know of any existing formal tools that do it, but I’m pretty sure we could make one) would be something like “the weights of the network don’t change too much if we change the input a small amount” or “the prediction error on the training set will decrease monotonically” or “the generalization error is bounded by assuming that the test data ends up being drawn from the same distribution as the training data”.
Presumably for friendliness you want something like “P and P’ satisfy some contract relative to each other” (such as sufficiently similar behavior) which is the sort of thing we are already relatively good at proving. Here I’m imagining that P is the “old” version of a program and P’ is a “modified” version of the program that we are considering using.
I understand “trade small local gains for large global gains” as a prescriptive principle, but does it work as a descriptive hypothesis? Why expect academics to be so much better than philanthropists at cause neutrality? When I speak to academics who aren’t also EAs, they are basically never cause neutral, and they even joke about how ridiculously non-cause-neutral everybody in academia is, and how accidental everyone’s choice of focus is, including their own.
I’m not talking about cause neutrality. My point is that even once the general problem has been decided, there are many possible approaches, and academics often do things that seem inefficient but are actually exploring the space of possible approaches (possibly by trying to better understand the objects they are studying).
What level of “general problem” do you have in mind? To a large degree I’m thinking about things like “Gosh, it took (unnecessary) centuries or decades for researchers to launch subfields to study normative uncertainty and intelligence explosion”, and that could be a “lack of cause neutrality” problem. And maybe you’re thinking instead on a smaller scale, and want to say something like “Given that people decide to work on X, they’re relatively efficient in working on X, and exploring the space within X, even if they’re completely missing normative uncertainty and intelligence explosion.”
This is an interesting hypothesis, and one I wasn’t thinking of. But hard to measure!
Out of curiosity, what gives you that impression? I tend to cite it because it is (along with the Lobian cooperation stuff) among the most important results to come out of MIRI’s first couple workshops, not because I can already tell whether it’s an important breakthrough in mathematical logic in general.
As for the purpose and relevance of the Lobian obstacle work, it seems like there might still be a failure of communication there. Since you and Eliezer and I discussed this at length and there still seems to be an unbridged gap, I’m not sure which thing I can say to bridge the gap. Maybe this quote from Paul?
In the OP I actually gave program equilibrium as an example of new theoretical progress that opens up new lines of inquiry, e.g. the modal agents work (though of course there are other pieces contributing to modal agents, too). So yeah, I don’t think the modal agents work is an example of inefficiency.
The examples I gave in the OP for apparent inefficiency in decision theory research was philosophy’s failure to formulate a reliabilist metatheory of instrumental rationality until 2013, even though reliabilist theories of epistemic rationality have been popular since the late 1960s, and also the apparently slow uptake of causal Bayes nets in the causal decision theory world.
In this very post you placed it in a list next to normative uncertainty and the intelligence explosion. The implication seemed obvious to me but perhaps it was unintended.
I seem to remember other comments / posts where similar sentiments were either expressed or implied, although a quick search doesn’t turn them up, so perhaps I was wrong.
Yeah, unintended, but I can see why one might infer that.
Does my “philosophical edge” comment imply importance to you? I was merely trying to say that it’s philosophical even though I’m thinking of it in terms of AI, and it’s not obvious to me, like your first example, why one would read the comment as assigning particular importance to the result.
I think that the comment that I quoted is not by itself objectionable to me. If that’s actually the only example I can come up with, then I think it would be unfair to criticize it, so I will update the parent to remove it.
Regardless of why the opportunity has presented itself, can we hope that the MIRI research team and associated researchers will use (or are using) the fact that “visible progress in decision theory is one way to “make a name” for oneself” and proceed to do so? Seems like pretty low-hanging status-fruit given the team’s progress so far.
For MIRI, the hard part is writing up the results in a way that appeals to philosophers. That’s a highly specialized skill, and not one we’ve focused on hiring for (at our current budget). We tried to pay Rachael Briggs $20k to do it, since she had two decision theory papers selected for the Philosopher’s Annual, but it was too work-intensive even for her. I think it would drive Eliezer mad to write in that style. I suspect I could do it, but it would take a lot of my time. I might be able to persuade Preston Greene to do it some day. Or maybe Kenny Easwaran, who attended our September 2013 decision theory workshop.
If possible, I’d be curious to hear more details about why Briggs found it too work-intensive. Her giving up on it was definitely not an outcome I would have predicted.
Seconded, I am also curious about why this is hard/how the style needed differs from how lukeprog and Eliezer write papers.
Some of the comments on Eliezer’s “Intelligence Explosion Microeconomics” suggest that his style might be too digressive & chatty by typical journal standards.
Maybe the difficulties that you face are part of the answer to the question of why theoretical progress doesn’t happen faster.
After a paper published in 2011: “[Original draft was available in 2003. Hurrah for academic publishing. One journal reviewed the manuscript for nearly two years before determining that it was too long. No wonder philosophy has not advanced farther in the past 2,500 years.] ”
Just wanted to mention that Physics is not immune to this. Bell’s theorem requires only a first-year college math skill, yet it took 30 odd years after EPR to formulate it. Not even Einstein himself was able do it. Event horizons and inescapability of singularity required virtually no new math beyond 1917, yet it took some 50 years for the physicists to understand the picture. There are clearly some mental blocks people have which take decades and new generations to overcome.
The idea of an event horizon goes back to John Michell in 1783. The derivation of a black hole and event horizon from the equations of General Relativity were done by Schwarzschild mere months after the publication of GR in 1915 (get your dates right!).
(Cool tidbit: Schwarzschild was an artillery gunner during the Great War, and spent his time looking for solutions to the Einstein field equations when he wasn’t calculating trajectories. He published the math behind GR black holes in a letter to Einstein from the front.)
What shminux means is that, even though the Schwarzschild metric was derived early as you say, its physical interpretation as a black hole was not understood till much later. According to Wikipedia, it was not until 1958 that the Schwarzschild radius was identified as an event horizon, a surface which causal influences could only cross in one direction. It was also in the 1950s that the maximal extension analytic extension of the Schwarzschild metric was constructed, and it was not till the 1960s that these results became widely known and progress in black hole physics really took off (along with Wheeler coining the term “black hole” sometime around 1964).
Here’s a test of hindsight bias:
QM violates Bell inequalities, but obeys Tsirelson inequalities (http://arxiv.org/pdf/1303.2849v1.pdf).
What does that mean? (Not a rhetorical question, I really don’t know!)
In graphical model terms, Bell inequality violations mean there is no “hidden variable DAG model” underlying what we see. But maybe Tsirelson inequality points to some correct generalization of the “hidden variable DAG” concept to the quantum setting (???). To my knowledge, nobody knows what to make of this, although it doesn’t take much math background to understand Tsirelson inequalities.
To be a little more precise, I can imagine an object that does not posit anything ontologically beyond the four variables related by the graph:
A → B <-> C ← D
The distributions that live in this object will, in general, violate both Bell and Tsirelson inequalities. So this object is “not physical.” I can also posit a hidden variable DAG (in other words I posit in addition to A,B,C,D another variable H):
A → B ← H → C ← D
This will obey Bell inequality. So this is “classically physical, but not physical.”
The question is, what should I posit beyond A,B,C,D to violate Bell, but obey Tsirelson? Whatever it is, it cannot be a hidden variable H. But maybe I can posit something more complicated, or “weird”? But obvious in hindsight?
I am not familiar with using DAG in QM, sorry.
Just wanted to mention that you can trade the EPR-style non-locality for macroscopic many worlds. For all its failings, this approach pushes the strangeness of QM into a local event where the branches interact. In the EPR example, it is where you compare the measurement results from the two detectors. Thus it might be more productive to base any DAG model on an MWI picture, or at least on a setup where there are only a finite and small number of branches, not uncountably many of them, like in Schrodinger’s cat or EPR, maybe something like this quantum bomb tester.
The “non-DAG jargon” question is: “what are the ontological implications of Tsirelson inequalities?”
My point is that this has the feel of one of those questions with an answer that will be very obvious (but only in hindsight).
Charles Fort, Lo!: “If human thought is a growth, like all other growths, its logic is without foundation of its own, and is only the adjusting constructiveness of all other growing things. A tree can not find out, as it were, how to blossom, until comes blossom-time. A social growth cannot find out the use of steam engines, until comes steam-engine-time. For whatever is supposed to be meant by progress, there is no need in human minds for standards of their own: this is in the sense that no part of a growing plant needs guidance of its own devising, nor special knowledge of its own as to how to become a leaf or a root. It needs no base of its own, because the relative wholeness of the plant is relative baseness to its parts.”
The means to make steam engines existed centuries before the age of steam. Ground glass lenses existed long before anyone put one in front of the other to make a telescope. The first people of the Americas had wheels—on toys, not carts. Great music could happen sooner (it usually doesn’t rely on new inventions).
I am convinced we are drowning in miraculous solutions to weighty problems and we are simply overlooking them. If I knew how to prepend ‘steam engine time,’ I assure you I would do so.
Some examples of a different kind of inefficiency, from AntiFragile:
For the perfectly valid reason that tiny wheels are useless unless you have a really even and smooth surface to roll them on.
Imagine the usefulness of a wheeled suitcase on farm. Or even on cobblestones—how long do you think these tiny wheels will survive?
I actually found that out the hard way.
We didn’t have even and smooth surfaces to roll them on until 1970?
Taleb complains about the gap of 6,000 years.
I think conformity effects play a huge role in this area. The large majority of modern philosophers all have similar educational and cultural backgrounds. They go to elite universities. They read the standard Western philosophical canon. They work very hard to publish a lot of papers in prestigious journals. They are friends with other academics and with other high achievers in the “standard” fields like finance, law, and medicine. Their parents were probably academics or from the upper middle class. They have spent most of their lives in a university setting.
If I had to take an honest guess? Theoretical discovery will behave “inefficiently” when it requires a breadth-first (or at least, breadth-focused) search through the idea space before you can find things that “fit together”. Only once you have a bunch of things which “fit together” can you look at the shape of the “hole in idea-space” they all border, dive to the bottom of that lake, and bring up an entirely new idea which links them or unifies them.
So:
1) Mostly agreed, as described above.
2) As described above. My reasoning is sociological: our current reward system for researchers optimizes our research process for depth rather than breadth. Looking where others are paid not to look would usually be a decent way to find things others haven’t seen.
3) I don’t know more than the bare minimum about decision theory, so I can’t say.
Now, as to analysis of an intelligence explosion taking such a long time, I have an Opinion (beware the capital letter): there may not be an intelligence explosion. Current research into AI indicates that coming up with formal models of utility-optimal agents in unknown active environments is the easy part, making them conscious and (stably) self-modifying is the “next part” currently under research, and scaling them down to fit inside the real universe is the hard part.
Schmidhuber (and when it comes to UAI I definitely root for the Schmidhubristic team ;-)) has claimed that Goedel Machine self-rewrites would dramatically speed up a mere AIXI paired with a mere HSearch until they became effective within the real world, but that’s putting his faith in the Goedel Machine’s proof searcher rather than in his (and his student’s) own proofs that their algorithms genuinely are optimal and the problems they’re tackling genuinely do have these nasty time-and-space bounds. If the first team to build an AI has to either keep it in an exponentially-small environment (existing AIXI models) or wait for astronomical periods of time for even the first self-rewrite, then the human race will die of an asteroid strike long before Friend Clippy can take us out.
This is the same nasty problem that AI has faced since the GOFAI days: easy to find an algorithm that locates the optimal move by searching the entire solution space, hard as hell to prove it will take any usefully small period of time to do so.
It strikes me that there’s a second set of reasons that read something like:
There is significant motivation to get it wrong, or wrong answers are very obvious.
See: the free will problem.
Normative uncertainty does not seem particularly interesting. You can just use set multiplication over probable futures and evaluations of them, and wind up with a regular problem with certain evaluations. It even agrees with normal methods if you assume a single, certain evaluation.
Now, as a concept it’s important—realizing that you might not know what you want is a tremendous source of uncertainty that is easy to overlook. I just don’t think that any new concepts or mathematical tools are needed to tackle it.
In presence, yes. In degree, no. Even the efficient market hypothesis presumes (indeed, requires) some delay inefficiencies, and even restricting to fields where economic incentives are very, very high we see some pretty significant delay inefficiencies.
The modern containerization revolution is dependent on certain types of math and metallurgy being available, but the underlying tools were probably available before the first world war (and, indeed, would have been even more useful if developed it then, due to varying railway gauges), but didn’t take off until the 1950s, and were controversial for a near decade after that. Supply chain management innovations since, including as recently as WalMart’s (and to a lesser extent, Amazon’s) shipping models, have trailed their underlying technologies what I’d estimate to at least a decade. Even purely digital financial markets have remarkable innovations that seem obvious in retrospect yet weren’t around for the longest time.
Philosophy seems to have further delays on top of that, but that they’re present in other fields suggests that we can overestimate the impact of some individual causes.
I’ve been looking at things from a more basic model. Innovation seems to be the ability to non-randomly(1) put together component concepts(2) into an implementable (3) result, some of the meaningful attributes will be where :
Where the typical connections of thoughts are non-complete, where they are based on untested or (especially) false assumptions.
Where we have unusual component concepts available, or (more commonly) where the average researcher is discouraged from outside component concepts. Interdisciplinary discoveries are major soft spots, here.
Where the costs of testing hypothesis are very small, have recently become smaller, or are unusually small for our skillset but not the average researcher.
These seems to cover decision theory, although it should be noted not all of the side effects of these attributes are beneficial.