The LW post may address some of your concerns. The idea here is that we need a tiling decision criterion, and the paper isn’t supposed to be an AI design, it’s supposed to get us a little conceptually closer to a tiling decision criterion. If you don’t understand why a tiling decision criterion is a good thing in a self-improving AI which is supposed to have a stable goal system, then I’m not quite sure what issue needs addressing.
Thanks for your courtesy, and again, sorry for not being more specific in my original comment.
Yes, I’m questioning why a self-improving AI which is intended to have a stable goal system needs a tiling decision criterion. In your publication, you wrote
In a self-modifying AI, most self-modifications should not change most aspects of the AI; it would be odd to consider agents that could only make large, drastic self-modifications. To reflect this desideratum within the viewpoint from agents constructing other agents, we will examine agents which construct successor agents of highly similar design...
I don’t see why the model of the sequence of agents is a good operationalization. My intuition is that
A self modifying AI would modify itself by modifying its modules one by one.
It would reconstruct a given module whole-cloth, rather than doing so by incrementally changing the module in small steps.
To elaborate, and for concreteness, I’ll comment on
If you wanted a road to a certain city to exist, you might try attaching more powerful arms to yourself so that you could lift paving stones into place. This can be viewed as a special case of constructing a new creature with similar goals and more powerful arms, and then replacing yourself with that creature.
I haven’t read the technical portions of the paper, but my surface impression is that the operationalization in the paper is analogous modifying your arms by successively shaving slivers of tissue off of them, and grafting slivers of tissue onto them, with a view toward making them really long. Another way to go would be to grow the long arms in a lab, chop off your current arms, and then graft the newly created long arms onto yourself. In the context of self-modifying AIs, the latter possibility seems to me to be significantly more likely than the former possibility.
Is my surface impression of the operationalization right? If so, what do you think about the points that I raise in the previous paragraphs?
Jonah, some self-modifications will potentially be large, but others might be smaller. More importantly we don’t want each self-modification to involve wrenching changes like altering the mathematics you believe in, or even worse, your goals. Most of the core idea in this paper is to prevent those kinds of drastic or deleterious changes from being forced by a self-modification.
But it’s also possible that there’ll be many gains from small self-modifications, and it would be nicer not to need a special case for those, and for this it is good to have (in theoretical principle) a highly regular bit of cognition/verification that needs to be done for the change (e.g. for logical agents the proof of a certain theorem) so that small local changes only call for small bits of the verification to be reconsidered.
Another way of looking at it is that we’re trying to have the AI be as free as possible to self-modify while still knowing that it’s sane and stable, and the more overhead is forced or the more small changes are ruled out, the less free it is.
More importantly we don’t want each self-modification to involve wrenching changes like altering the mathematics you believe in, or even worse, your goals.
I’m very sympathetic to this in principle, but don’t see why there would be danger of these things in practice.
But it’s also possible that there’ll be many gains from small self-modifications,
Humans constantly perform small self-modifications, and this doesn’t cause serious problems. People’s goals do change, but not drastically, and people who are determined can generally keep their goals pretty close to their original goals. Why do you think that AI would be different?
Another way of looking at it is that we’re trying to have the AI be as free as possible to self-modify while still knowing that it’s sane and stable, and the more overhead is forced or the more small changes are ruled out, the less free it is.
To ensure that one gets a Friendly AI, it suffices to start with good goal system, and to ensure that the goal system remains pretty stable over time. It’s not necessary that the AI be as free as possible.
You might argue that an limited AI wouldn’t be able to realize as good as a future as one without limitations.
But if this is the concern, why not work to build a limited AI that can itself solve the problems about having a stable goal system under small modifications? Or, if it’s not possible to get a superhuman AI subject to such limitations, why not build a subhuman AI and then work in conjunction with it to build Friendly AI that’s as free as possible?
Many things in AI that look like they ought to be easy have hidden gotchas which only turn up once you start trying to code them, and we can make a start on exposing some of these gotchas by figuring out how to do things using unbounded computing power (albeit this is not a reliable way of exposing all gotchas, especially in the hands of somebody who prefers to hide difficulties, or even someone who makes a mistake about how a mathematical object behaves, but it sure beats leaving everything up to verbal arguments).
Human beings don’t make billions of sequential self-modifications, so they’re not existence proofs that human-quality reasoning is good enough for that.
I’m not sure how to go about convincing you that stable-goals self-modification is not something which can be taken for granted to the point that there is no need to try to make the concepts crisp and lay down mathematical foundations. If this is a widespread reaction beyond yourself then it might not be too hard to get a quote from Peter Norvig or a similar mainstream authority that, “No, actually, you can’t take that sort of thing for granted, and while what MIRI’s doing is incredibly preliminary, just leaving this in a state of verbal argument is probably not a good idea.”
Depending on your math level, reading Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference by Judea Pearl might present you with a crisper idea of why it can be a good idea to formalize certain types of AI problems in general, and it would be a life-enriching experience, but I anticipate that’s more effort than you’d want to put into this exact point.
Many things in AI that look like they ought to be easy have hidden gotchas which only turn up once you start trying to code them
I don’t disagree (though I think that I’m less confident on this point than you are).
Human beings don’t make billions of sequential self-modifications, so they’re not existence proofs that human-quality reasoning is good enough for that.
Why do you think that an AI would need to make billions of sequential self-modifications when humans don’t need to?
I’m not sure how to go about convincing you that stable-goals self-modification is not something which can be taken for granted to the point that there is no need to try to make the concepts crisp and lay down mathematical foundations. If
I agree that it can’t be taken for granted. My questions are about the particular operationalization of a self-modifying AI that you use in your publication. Why do you think that the particular operationalization is going to be related to the sorts of AIs that people might build in practice?
I agree that it can’t be taken for granted. My questions are about the particular operationalization of a self-modifying AI that you use in your publication. Why do you think that the particular operationalization is going to be related to the sorts of AIs that people might build in practice?
The paper is meant to be interpreted within an agenda of “Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty”; not as “We think this Godelian difficulty will block AI”, nor “This formalism would be good for an actual AI”, nor “A bounded probabilistic self-modifying agent would be like this, only scaled up and with some probabilistic and bounded parts tacked on”. If that’s not what you meant, please clarify.
Ok, that is what I meant, so your comment has helped me better understand your position.
Why do you think that
Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty
is cost-effective relative to other options on the table?
Personally, I feel like that kind of metawork is very important, but that somebody should also be doing something that isn’t just metawork. If there’s nobody making concrete progress on the actual problem that we’re supposed to be solving, there’s a major risk of the whole thing becoming a lost purpose, as well as of potentially-interested people wandering off to somewhere where they can actually do something that feels more real.
as well as of potentially-interested people wandering off to somewhere where they can actually do something that feels more real
From inside MIRI, I’ve been able to feel this one viscerally as genius-level people come to me and say “Wow, this has really opened my eyes. Where do I get started?” and (until now) I’ve had to reply “Sorry, we haven’t written down our technical research agenda anywhere” and so they go back to machine learning or finance or whatever because no, they aren’t going to learn 10 different fields and become hyper-interdisciplinary philosophers working on important but slippery meta stuff like Bostrom and Shulman.
I think that in addition to this being true, it is also how it looks from the outside—at least, it’s looked that way to me, and I imagine many others who have been concerned about SI focusing on rationality and fanfiction are coming from a similar perspective. It may be the case that without the object-level benefits, the boost to MIRI’s credibility from being seen to work on the actual technical problem wouldn’t justify the expense of doing so, but whether or not it would be enough to justify the investment by itself, I think it’s a really significant consideration.
[ETA: Of course, in the counterfactual where working on the object problem actually isn’t that important, you could try to explain this to people and maybe that would work. But since I think that it is actually important, I don’t particularly expect that option to be available.]
Yes. I’ve had plenty of conversations with people who were unimpressed with MIRI, in part because the organization looked like it was doing nothing but idle philosophy. (Of course, whether that was the true rejection of the skeptics in question is another matter.)
If I gave you a list of people who in fact expressed interest but then, when there were no technical problems for them to work on, “wandered off to somewhere where they can actually do something that feels more real,” would you change your mind? (I may not be able to produce such a list, because I wasn’t writing down people’s names as they wandered away, but I might be able to reconstruct it.)
Well, I’m not sure the “oops” is justified, given that two years ago, I really couldn’t help you contribute to a MIRI technical research program, since it did not exist.
No, the oops is on me for not realizing how shallow “working on something that feels more real” would feel after the novelty of being able to explain what I work on to laypeople wore off.
I don’t doubt you: I have different reasons for believing Kaj’s concerns to be unwarranted:
It’s not clear to me that offering people problems in mathematical logic is a good way to get people to work on Friendly AI problems. I think that the mathematical logic work is pretty far removed from the sort of work that will be needed for friendliness.
I believe that people who are interested in AI safety will not forget about AI safety entirely, independently of whether they have good problems to work on now.
I believe that people outside of MIRI will organically begin to work on AI safety without MIRI’s advocacy when AI is temporally closer.
offering people problems in mathematical logic is a good way to get people to work on Friendly AI problems
Mathematical logic problems are FAI problems. How are we going to build something self-improving that can reason correctly without having a theory of what “reasoning correctly” (ie logic) even looks like?
I think that the mathematical logic work is pretty far removed from the sort of work that will be needed for friendliness.
Based on what?
I’ll admit I don’t know that I would settle on mathematical logic as an important area of work, but EY being quite smart, working on this for ~10 years, and being able to convince quite a few people who are in a position to judge on this is good confirmation of the plausible idea that work in reflectivity of formal systems is a good place to be.
If you do have some domain knowledge that I don’t have that makes stable reflectivity seem less important and puts you in a position to disagree with an expert (EY), please share.
I believe that people who are interested in AI safety will not forget about AI safety entirely, independently of whether they have good problems to work on now.
People can get caught in other things. Maybe without something to work on now, they get deep into something else and build their skills in that and then the switching costs are too high to justify it. Mind you there is a steady stream of smart people, but opportunity costs.
Also, MIRI may be burning reputation capital by postponing actual work such that there may be less interested folks in the future. This could go either way, but it’s a risk that should be accounted for.
(I for one (as a donor and wannabe contributor) appreciate that MIRI is getting these (important-looking) problems available to the rest of us now)
I believe that people outside of MIRI will organically begin to work on AI safety without MIRI’s advocacy when AI is temporally closer.
How will they tell? What if it happens too fast? What if the AI designs that are furthest along are incompatible with stable reflection? Hence MIRI working on stategic questions like “how close are we, how much warning can we expect” (Intelligence Explosion Microecon), and “What fundamental architectures are even compatible with friendliness” (this Lob stuff).
Mathematical logic problems are FAI problems. How are we going to build something self-improving that can reason correctly without having a theory of what “reasoning correctly” (ie logic) even looks like?
See my responses to paper-machine on this thread for (some reasons) why I’m questioning the relevance of mathematical logic.
I’ll admit I don’t know that I would settle on mathematical logic as an important area of work, but EY being quite smart, working on this for ~10 years, and being able to convince quite a few people who are in a position to judge on this is good confirmation
I don’t see this as any more relevant than Penrose’s views on consciousness, which I recently discussed. Yes, there are multiple people who are convinced, but their may be spurious correlations which are collectively driving their interests. Some that come to mind are
Subject-level impressiveness of Eliezer.
Working on these problems offering people a sense of community.
Being interested in existential risk reduction and not seeing any other good options on the table for reducing existential risk.
Intellectual interestingness of the problems.
Also, I find Penrose more impressive than all of the involved people combined. (This is not intended as a slight – rather, the situation is that Penrose’s accomplishments are amazing.)
of the plausible idea that work in reflectivity of formal systems is a good place to be.
The idea isn’t plausible to me, again, for reasons that I give in my responses to paper-machine (among others).
If you do have some domain knowledge that I don’t have that makes stable reflectivity seem less important and puts you in a position to disagree with an expert (EY), please share.
No, my reasons are at a meta-level rather than an object level, just as most members of the Less Wrong community (rightly) believe that Penrose’s views on consciousness are very likely wrong without having read his arguments in detail.
People can get caught in other things. Maybe without something to work on now, they get deep into something else and build their skills in that and then the switching costs are too high to justify it. Mind you there is a steady stream of smart people, but opportunity costs.
This is possible, but I don’t think that it’s a major concern.
(I for one (as a donor and wannabe contributor) appreciate that MIRI is getting these (important-looking) problems available to the rest of us now)
Note this as a potential source of status quo bias.
How will they tell? What if it happens too fast?
Place yourself in the shoes of the creators of the early search engines, online book store, and social network websites. If you were in their positions, would you feel justified in concluding “if we don’t do it then no one else will”? If not, why do you think that AI safety will be any different?
I agree that it’s conceivable that it could happen to fast, but I believe that there’s strong evidence that it won’t happen within the next 20 years, and 20 years is a long time for people to become interested in AI safety.
The question to my mind is why is mathematical logic the right domain? Why not game theory, or solid state physics, or neural networks? I don’t see any reason to privilege mathematical logic – a priori it seems like a non sequitur to me. The only reason that I give some weight to the possibility that it’s relevant is that other people believe that it is.
AI’s do Reasoning. If you can’t see the relevance of logic to reasoning, I can’t help.
Further, do you have some other domain of inquiry that has higher expected return? I’ve seen a lot of stated meta-level skepticism, but no strong arguments either on the meta level (why should MIRI be as uncertain as you) or the object level (are there arguments against studying logic, or arguments for doing something else).
Now I imagine it seems to you that MIRI is privileging the mathematical logic hypothesis, but as above, it looks to me rather obviously relevant such that it would take some evidence against it to put me in your epistemic position.
(Though strictly speaking given strong enough evidence against MIRI’s strategy I would go more towards “I don’t know what’s going on here, everything is confusing” rather than your (I assume) “There’s no good reason one way or the other”)
You seem to be taking a position of normative ignorance (I don’t know and neither can you), in what looks like the face of plenty of information. I would expect rational updating exposed to such information to yield a strong position one way or the other or epistemic panic, not calm (normative!) ignorance.
Note that to take a position of normative uncertainty you have to believe that not only have you seen no evidence, there is no evidence. I’m seeing normative uncertainty and no strong reason to occupy a position of normative uncertainty, so I’m confused.
AI’s do Reasoning. If you can’t see the relevance of logic to reasoning, I can’t help.
Humans do reasoning without mathematical logic. I don’t know why anyone would think that you need mathematical logic to do reasoning.
Further, do you have some other domain of inquiry that has higher expected return? I’ve seen a lot of stated meta-level skepticism, but no strong arguments either on the meta level (why should MIRI be as uncertain as you) or the object level (are there arguments against studying logic, or arguments for doing something else).
See each part of my comment here as well as my response to Kawoobma here.
You seem to be taking a position of normative ignorance (I don’t know and neither can you), in what looks like the face of plenty of information. I would expect rational updating exposed to such information to yield a strong position one way or the other or epistemic panic, not calm (normative!) ignorance.
I want to hedge because I find some of the people involved in MIRI’s Friendly AI research to be impressive, but putting that aside, I think that the likelihood of the research being useful for AI safety is vanishingly small, at the level of the probability of a random conjunctive statement of similar length being true.
Humans do reasoning without mathematical logic. I don’t know why anyone would think that you need mathematical logic to do reasoning.
Right. Humans do reasoning, but don’t really understand reasoning. Since ancient times, when people try to understand something they try to formalize it, hence the study of logic.
If we want to build something that can reason we have to understand reasoning or we basically won’t know what we are getting. We can’t just say “humans reason based on some ad-hoc kludgy nonformal system” and then magically extract an AI design from that. We need to build something we can understand or it won’t work, and right now, understanding reasoning in the abstract means logic and it’s extensions.
It’s a double need, though, because not only do we need to understand reasoning, self-improvement means the created thing needs to understand reasoning. Right now we don’t have a formal theory of reasoning that can handle understanding it’s own reasoning without losing power. So that’s we need to solve that.
If we want to build something that can reason we have to understand reasoning or we basically won’t know what we are getting. We can’t just say “humans reason based on some ad-hoc kludgy nonformal system” and then magically extract an AI design from that. We need to build something we can understand or it won’t work, and right now, understanding reasoning in the abstract means logic and it’s extensions.
Note that this is different from what you were saying before, and that commenting along the lines of “AI’s do Reasoning. If you can’t see the relevance of logic to reasoning, I can’t help” without further explanation doesn’t adhere to the principle of charity.
I’m very familiar with the argument that you’re making, and have discussed it with dozens of people. The reason why I didn’t respond to the argument before you made it is because I wanted to isolate our core point(s) of disagreement, rather than making presumptions. The same holds for my points below.
If we want to build something that can reason we have to understand reasoning or we basically won’t know what we are getting.
This argument has the form “If we want to build something that does X, we have to understand X, or we won’t know what we’re getting.” But this isn’t true in full generality. For example, we can build a window shade without knowing how the window shade blocks light, and still know that we’ll be getting something that blocks light. Why do you think that AI will be different?
We can’t just say “humans reason based on some ad-hoc kludgy nonformal system” and then magically extract an AI design from that.
Why do you think that it’s at all viable to create an AI based on a formal system? (For the moment putting aside safety considerations.)
As to the rest of your comment — returning to my “Chinese economy” remarks — the Chinese economy is a recursively self-improving system with “goal” of maximizing GDP. It could be that there’s goal drift, and that the Chinese economy starts optimizing for something random. But I think that the Chinese economy does a pretty good job of keeping this “goal” intact, and that it’s been doing a better and better job over time. Why do you think that it’s harder to ensure that an AI keeps its goal intact than it is to ensure that the Chinese economy keeps its “goal” intact.
AI have to come to conclusions about the state of the world, where “world” also includes their own being. Model theory is the field that deals with such things formally.
Why not game theory, or solid state physics, or neural networks?
These could be relevant, but it seems to me that “mind of an AI” is an emergent phenomena of the underlying solid state physics, where “emergent” here means “technically explained by, but intractable to study as such.” Game theory and model theory are intrinsically linked at the hip, and no comment on neural networks.
AI have to come to conclusions about the state of the world, where “world” also includes their own being. Model theory is the field that deals with such things formally.
But the most intelligent beings that we know of are humans, and they don’t use mathematical logic.
But the most intelligent beings that we know of are humans, and they don’t use mathematical logic.
Did humans have another choice in inventing the integers? (No. The theory of integers has only one model, up to isomorphism and cardinality.) In general, the ontology a mind creates is still under the aegis of mathematical logic, even if that mind didn’t use mathematical logic to invent it.
Sure, but that’s only one perspective. You can say that it’s under the aegis of particle physics, or chemistry, or neurobiology, or evolutionary psychology, or other things that I’m not thinking of. Why single out mathematical logic.
You can say that it’s under the aegis of particle physics, or chemistry, or neurobiology,
Going back to humans, getting an explanation of minds out of any of these areas requires computational resources that don’t currently exist. (In the case of particle physics, one might rather say “cannot exist.”)
Why single out mathematical logic.
Because we can prove theorems that will apply to whatever ontology AIs end up dreaming up. Unreasonable effectiveness of mathematics, and all that. But now I’m just repeating myself.
Because we can prove theorems that will apply to whatever ontology AIs end up dreaming up. Unreasonable effectiveness of mathematics, and all that. But now I’m just repeating myself.
I’m puzzled by your remark. It sounds like a fully general argument. One could equally well say that one should use mathematical logic to build a successful marriage, or fly an airplane, or create a political speech. Would you say this? If not, why do you think that studying mathematical logic is the best way to approach AI safety in particular?
I’m puzzled by your remark. It sounds like a fully general argument.
No, a fully general argument is something like “well, that’s just one perspective.” Mathematical logic will not tell you anything about marriage, other than the fact that it is an relation of variable arity (being kind to the polyamorists for the moment).
One could equally well say that one should use mathematical logic to build a successful marriage, or fly an airplane, or create a political speech. Would you say this?
I have no idea why a reasonable person would say any of these things.
If not, why do you think that studying mathematical logic is the best way to approach AI safety in particular?
I’d call it the best currently believed way with a chance of developing something actionable without probably requiring more computational power than a matryoshka brain. That’s because it’s the formal study of models and theories in general. Unless you’re willing to argue that AIs will have neither cognitive feature? That’s kind of rhetorical, though—I’m growing tired.
Given that the current Lob paper is non-constructive (invoking the axiom of choice) and hence is about as uncomputable as possible, I don’t understand why you think mathematical logic will help with computational concerns.
The paper on probabilistic reflection in logic is non-constructive, but that’s only sec. 4.3 of the Lob paper. Nothing non-constructive about T-n or TK.
I believe one of the goals this particular avenue of research is to make this result constructive. Also, He was talking about the study of mathematical logic in general not just this paper.
That was rather rude. I certainly don’t claim that proofs involving choice are useless, merely that they don’t address the particular criterion of computational feasibility.
That sounds like a very long conversation if we’re supposed to be giving quantitative estimates on everything. The qualitative version is just that this sort of thing can take a long time, may not parallelize easily, and can potentially be partially factored out to academia, and so it is wise to start work on it as soon as you’ve got enough revenue to support even a small team, so long as you can continue to scale your funding while that’s happening.
This reply takes for granted that all astronomical benefits bottleneck through a self-improving AI at some point.
My understanding based on what you say is that the research in your paper is intended to spearhead a field of research, rather than to create something that will be directly used for friendliness in the first AI. Is this right?
If so, our differences are about the sociology of the scientific, technological and political infrastructure rather than about object level considerations having to do with AI.
Sounds about right. You might mean a different thing from “spearhead a field of research” than I do, my phrasing would’ve been “Start working on the goddamned problem.”
From your other comments I suspect that you have a rather different visualization of object-level considerations to do with AI and this is relevant to your disagreement.
Ok. I think that MIRI could communicate more clearly by highlighting this. My previous understanding had been that MIRI staff think that by default, one should expect to need to solve the Lob problem in order to build a Friendly AI. Is there anything in the public domain that would have suggested otherwise to me? If not, I’d suggest writing this up and highlighting it.
AFAIK, the position is still “need to ‘solve’ Lob to get FAI”, where ‘solve’ means find a way to build something that doesn’t have that problem, given that all the obvious formalisms do have such problems. Did EY suggest otherwise?
By default, if you can build a Friendly AI you can solve the Lob problem. That working on the Lob Problem gets you closer to being able to build FAI is neither obvious nor certain, but everything has to start somewhere...
EDIT: Moved the rest of this reply to a new top-level comment because it seemed important and I didn’t want it buried.
cost-effective relative to other options on the table
BTW, I spent a large fraction of the first few months of 2013 weighing FAI research vs. other options before arriving at MIRI’s 2013 strategy (which focuses heavily on FAI research). So it’s not as though I think FAI research is obviously the superior path, and it’s also not as though we haven’t thought through all these different options, and gotten feedback from dozens of people about those options, and so on.
Also note that MIRI did, in fact, decide to focus on (1) spreading rationality, and (2) building a community of people who care about rationality, the far future, and x-risk, before turning its head to FAI research: see (in chronological order) the Singularity Summit, Less Wrong and CFAR.
But the question of which interventions are most cost effective (given astronomical waste) is a huge and difficult topic, one that will require thousands of hours to examine properly. Building on Beckstead and Bostrom, I’ve tried to begin that examination here. Before jumping over to that topic, I wonder: do you now largely accept the case Eliezer made for this latest paper as an important first step on an important sub-problem of the Friendly AI problem? And if not, why not?
BTW, I spent a large fraction of the first few months of 2013 weighing FAI research vs. other options before arriving at MIRI’s 2013 strategy (which focuses heavily on FAI research). So it’s not as though I think FAI research is obviously the superior path, and it’s also not as though we haven’t thought through all these different options, and gotten feedback from dozens of people about those options, and so on.
My comments were addressed at Eliezer’s paper specifically, rather than MIRI’s general strategy, or your own views.
Also note that MIRI did, in fact, decide to focus on (1) spreading rationality, and (2) building a community of people who care about rationality, the far future, and x-risk, before turning its head to FAI research: see (in chronological order) the Singularity Summit, Less Wrong and CFAR.
Sure – what I’m thinking about is cost-effectiveness at the margin.
Before jumping over to that topic, I wonder: do you now largely accept the case Eliezer made for this latest paper as an important first step on an important sub-problem of the Friendly AI problem? And if not, why not?
Based on Eliezer’s recentcomments, my impression is that Eliezer is not making such a case, and is rather making a case for the paper being of sociological/motivational value. Is your understanding different?
Based on Eliezer’s recent comments, my impression is that Eliezer is not making such a case, and is rather making a case for the paper being of sociological/motivational value.
No, that’s not what I’ve been saying at all.
I’m sorry if this seems rude in some sense, but I need to inquire after your domain knowledge at this point. What is your level of mathematical literacy and do you have any previous acquaintance with AI problems? It may be that, if we’re to proceed on this disagreement, MIRI should try to get an eminent authority in the field to briefly confirm basic, widespread, and correct ideas about the relevance of doing math to AI, rather than us trying to convince you of that via object-level arguments that might not be making any sense to you.
By ‘the relevance of math to AI’ I don’t mean mathematical logic, I mean the relevance of trying to reduce an intuitive concept to a crisp form. In this case, like it says in the paper and like it says in the LW post, FOL is being used not because it’s an appropriate representational fit to the environment… though as I write this, I realize that may sound like random jargon on your end… but because FOL has a lot of standard machinery for self-reflection of which we could then take advantage, like the notion of Godel numbering or ZF proving that every model entails every tautology… which probably doesn’t mean anything to you either. But then I’m not sure how to proceed; if something can’t be settled by object-level arguments then we probably have to find an authority trusted by you, who knows about the (straightforward, common) idea of ‘crispness is relevant to AI’ and can quickly skim the paper and confirm ‘this work crispifies something about self-modification that wasn’t as crisp before’ and testify that to you. This sounds like a fair bit of work, but I expect we’ll be trying to get some large names to skim the paper anyway, albeit possibly not the Early Draft for that.
I need to inquire after your domain knowledge at this point. What is your level of mathematical literacy and do you have any previous acquaintance with AI problems?
Quick Googling suggest someone named “Jonah Sinick” is a mathematician in number theory. It appears to be the same person.
I really wish Jonah had mentioned that some number of comments ago, there’s a lot of arguments I don’t even try to use unless I know I’m talking to a mathematical literati.
What is your level of mathematical literacy and do you have any previous acquaintance with AI problems?
I have a PhD in pure math, I know the basic theory of computation and of computational complexity, but I don’t have deep knowledge of these domains, and I have no acquaintance with AI problems.
It may be that, if we’re to proceed on this disagreement, MIRI should try to get an eminent authority in the field to briefly confirm basic, widespread, and correct ideas about the relevance of doing math to AI, rather than us trying to convince you of that via object-level arguments that might not be making any sense to you.
Yes, this could be what’s most efficient. But my sense is that our disagreement is at a non-technical level rather than at a technical level.
My interpretation of
The paper is meant to be interpreted within an agenda of “Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty”; not as “We think this Godelian difficulty will block AI”, nor “This formalism would be good for an actual AI”, nor “A bounded probabilistic self-modifying agent would be like this, only scaled up and with some probabilistic and bounded parts tacked on”.
was that you were asserting only very weak confidence in the relevance the paper to AI safety, and that you were saying “Our purpose in writing this was to do something that could conceivably have something to do with AI safety, so that people take notice and start doing more work on AI safety.” Thinking it over, I realize that you might have meant “We believe that this paper is an important first step on a technical level. Can you clarify here?
If the latter interpretation is right, I’d recur to my question about why the operationalization is a good one, which I feel that you still haven’t addressed, and which I see as crucial.
Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty
is cost-effective relative to other options on the table?
...
BTW, I spent a large fraction of the first few months of 2013 weighing FAI research vs. other options before arriving at MIRI’s 2013 strategy (which focuses heavily on FAI research). So it’s not as though I think FAI research is obviously the superior path, and it’s also not as though we haven’t thought through all these different options, and gotten feedback from dozens of people about those options, and so on.
My comments were addressed at Eliezer’s paper specifically, rather than MIRI’s general strategy, or your own views.
Do you not see that what Luke wrote was a direct response to your question?
There are really two parts to the justification for working on the this paper: 1) Direct FAI research is a good thing to do now. 2) This is a good problem to work on within FAI research. Luke’s comment gives context explaining why MIRI is focusing on direct FAI research, in support of 1. And it’s clear from what you list as other options that you weren’t asking about 2.
It sounds like what you want is for this problem to be compared on its own to every other possible intervention. In theory that would be the rational thing to do to ensure you were always doing the most cost-effective work on the margin. But that only makes sense if it’s computationally practical to do that evaluation at every step.
What MIRI has chosen to do instead is to invest some time up front coming up with a strategic plan, and then follow through on that. This seem entirely reasonable to me.
If the probability is too small, then it isn’t worth it. The activities that I mention plausibly reduce astronomical waste to a nontrivial degree. Arguing that you can do better than them requires an argument that establishes the expected impact of MIRI Friendly AI research on AI safety above a nontrivial threshold.
Do you not see that what Luke wrote was a direct response to your question?
Which question?
Luke’s comment gives context explaining why MIRI is focusing on direct FAI research, in support of 1.
Sure, I acknowledge this.
It sounds like what you want is for this problem to be compared on its own to every other possible intervention. In theory that would be the rational thing to do to ensure you were always doing the most cost-effective work on the margin. But that only makes sense if it’s computationally practical to do that evaluation at every step.
I don’t think that it’s computationally intractable to come up with better alternatives. Indeed, I think that there are a number of concrete alternatives that are better.
What MIRI has chosen to do instead is to invest some time up front coming up with a strategic plan, and then follow through on that. This seem entirely reasonable to me.
I wasn’t disputing this. I was questioning the relevance of MIRI’s current research to AI safety, not saying that MIRI’s decision process is unreasonable.
The one I quoted: “Why do you think that … is cost-effective relative to other options on the table?”
Yes, you have a valid question about whether this Lob problem is relevant to AI safety.
What I found frustrating as a reader was that you asked why Eliezer was focusing on this problem as opposed to other options such as spreading rationality, building human capital, etc. Then when Luke responded with an explanation that MIRI had chosen to focus on FAI research, rather than those other types of work, you say, no I’m not asking about MIRI’s strategy or Luke’s views, I’m asking about this paper. But the reason Eliezer is working on this paper is because of MIRI’s strategy!
So that just struck me as sort of rude and/or missing the point of what Luke was trying to tell you. My apologies if I’ve been unnecessarily uncharitable in interpreting your comments.
What I found frustrating as a reader was that you asked why Eliezer was focusing on this problem as opposed to other options such as spreading rationality, building human capital, etc. Then when Luke responded with an explanation that MIRI had chosen to focus on FAI research, rather than those other types of work, you say, no I’m not asking about MIRI’s strategy or Luke’s views, I’m asking about this paper. But the reason Eliezer is working on this paper is because of MIRI’s strategy!
I read Luke’s comment differently, based on the preliminary “BTW.” My interpretation was that his purpose in making thecomment was to give a tangentially related contextual remark rather than to answer my question. (I wasn’t at all bothered by this – I’m just explaining why I didn’t respond to it as if it were intended to address my question.)
The way I’m using these words, my “this latest paper as an important first step on an important sub-problem of the Friendly AI problem” is equivalent to Eliezer’s “begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty.”
Ok. I disagree that the paper is an important first step.
Because Eliezer is making an appeal based on psychological and sociological considerations, spelling out my reasoning requires discussion of what sorts of efforts are likely to impact the scientific community, and whether one can expect such research to occur by default. Discussing these requires discussion of psychology, sociology and economics, partly as related to whether the world’s elites will navigate the creation of AI just fine.
I look forward to it! Our models of how the scientific community works may be substantially different. To consider just one particularly relevant example, consider what the field of machine ethics looks like without the Yudkowskian line.
I agree that Eliezer has substantially altered the field of machine ethics. My view here is very much contingent on the belief that elites will navigate the creation of AI just fine, which, if true, is highly nonobvious.
Other options on the table are not mutually exclusive. There is a lot of wealth and intellectual brain power in the world, and a lot of things to work on. We can’t and shouldn’t all work on one most important problem. We can’t all work on the thousand most important problems. We can’t even agree on what those problems are.
I suspect Eliezer has a comparative advantage in working on this type of AI research, and he’s interested in it, so it makes sense for him to work on this. It especially makes sense to the extent that this is an area no one else is addressing. We’re only talking about an expenditure of several careers and a few million dollars. Compared to the world economy, or even compared to the non-profit sector, this is a drop in the bucket.
Now if instead Eliezer was the 10,000th smart person working on string theory, or if there was an Apollo-style government-funded initiative to develop an FAI by 2019, then my estimate of the comparative advantage of MIRI would shift. But given the facts as they are, MIRI seems like a plausible use of the limited resources it consumes.
I suspect Eliezer has a comparative advantage in working on this type of AI research, and he’s interested in it, so it makes sense for him to work on this.
If Eliezer feels that this is his comparative advantage then it’s fine for him to work on this sort of research — I’m not advocating that such research be stopped. My own impression is that Eliezer has comparative advantage in spreading rationality and that he could have a bigger impact by focusing on doing so.
It especially makes sense to the extent that this is an area no one else is addressing. We’re only talking about an expenditure of several careers and a few million dollars. Compared to the world economy, or even compared to the non-profit sector, this is a drop in the bucket.
I’m not arguing that such research shouldn’t be funded. The human capital question is genuinely more dicey, insofar as I think that Eliezer has contributed substantial value through his work on spreading rationality, and my best guess is that the opportunity cost of not doing more is large.
Why do you think that an AI would need to make billions of sequential self-modifications when humans don’t need to?
For starters, humans aren’t able to make changes as easily as an AI can. We don’t have direct access to our source code that we can change effortlessly, any change we make costs either time, money, or both.
That doesn’t address the question. It says that an AI could more easily make self-modifications. It doesn’t suggest that an AI needs to make such self-modifications. Human intelligence is an existence proof that human-level intelligence does not require “billions of sequential self-modifications”. Whether greater than human intelligence requires it, in fact whether greater than human intelligence is even possible, is still an open question.
So I reiterate, “Why do you think that an AI would need to make billions of sequential self-modifications when humans don’t need to?”
Human intelligence is an existence proof that human-level intelligence does not require “billions of sequential self-modifications”. Whether greater than human intelligence requires it, in fact whether greater than human intelligence is even possible, is still an open question.
Why do you think that an AI would need to make billions of sequential self-modifications when humans don’t need to?
Human intelligence required billions of sequential modifications (though not selfmodifications). An AI in general would not need self-modifications, but for a AGI it seems that it would be necessary. I don’t doubt a formal reasoning for the latter statement has been written by someone smarter than me before, but a very informal argument would be something like this:
If an AGI doesn’t need to self-modify, then that AGI is already perfect (or close enough that it couldn’t possibly matter). Since practically no software humans ever built was ever perfect in all respects, that seems exceedingly unlikely. Therefore, the first AGI would (very likely) need to be modified. Of course, at the begining it might be modified by humans (thus, not selfmodified), but the point of building AGI is to make it smarter than us. Thus, once it is smarter than us by a certain amount, it wouldn’t make sense for us (stupider intellects) to improve it (smarter intellect). Thus, it would need to self-modify, and do it a lot, unless by some ridiculously fortuitous accident of math (a) human intelligence is very close to the ideal, or (b) human intelligence will build something very close to the ideal on the first try.
It would be nice if those modifications would be things that are good for us, even if we can’t understand them.
″...need to make billions of sequential self-modifications when humans don’t need to” to do what? Exist, maximize utility, complete an assignment, fulfill a desire...? Some of those might be better termed as “wants” than “needs” but that info is just as important in predicting behavior.
Depending on your math level, reading Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference by Judea Pearl might present you with a crisper idea of why it can be a good idea to formalize certain types of AI problems in general, and it would be a life-enriching experience, but I anticipate that’s more effort than you’d want to put into this exact point.
FWIW, Jonah has a PhD in math and has probably read Pearl or a similar graphical models book.
(Not directly relevant to the conversation, but just trying to lower your probability estimate that Jonah’s objections are naieve.)
I don’t see it is a decisive point, one of “many weak arguments,” but I think the analogy with human self-modification is relevant. I would like to see more detailed discussion of the issue.
Aspects of this that seem relevant to me:
Genetic and cultural modifications to human thinking patterns have been extremely numerous. If you take humanity as a whole as an entity doing self-modification on itself, there have been an extremely large number of successful self-modifications.
Genetic and cultural evolution have built humans individually capable of self-modification without stumbling over Lobian obstacles. Evolution and culture likely used relatively simple and easy search processes to do this, rather than ones that rely on very sophisticated mathematical insights. Analogously, one might expect that people will develop AGI in a way that overcomes these problems as well.
Self-modification is to be interpreted to include ‘directly editing one’s own low-level algorithms using high-level deliberative process’ but not include ‘changing one’s diet to change one’s thought processes’. If you are uncomfortable using the word ‘self-modification’ for this please substitute a new word ‘fzoom’ which means only that and consider everything I said about self-modification to be about fzoom.
Humans wouldn’t look at their own source code and say, “Oh dear, a Lobian obstacle”, on this I agree, but this is because humans would look at their own source code and say “What?”. Humans have no idea under what exact circumstances they will believe something, which comes with its own set of problems. The Lobian obstacle shows up when you approach things from the end we can handle, namely weak but well-defined systems which can well-define what they will believe, whereas human mathematicians are stronger than ZF plus large cardinals but we don’t know how they work or what might go wrong or what might change if we started editing neural circuit #12,730,889,136.
As Christiano’s work shows, allowing for tiny finite variances of probability might well dissipate the Lobian obstacle, but that’s the sort of thing you find out by knowing what a Lobian obstacle is.
Self-modification is to be interpreted to include ‘directly editing one’s own low-level algorithms using high-level deliberative process’ but not include ‘changing one’s diet to change one’s thought processes’. If you are uncomfortable using the word ‘self-modification’ for this please substitute a new word ‘fzoom’ which means only that and consider everything I said about self-modification to be about fzoom.
Very helpful. This seems like something that could lead to a satisfying answer to my question. And don’t worry, I won’t engage in a terminological dispute about “self-modification.”
Can you clarify a bit what you mean by “low-level algorithms”? I’ll give you a couple of examples related to what I’m wondering about.
Suppose I am working with a computer to make predictions about the the weather, and we consider the operations of the computer along with my brain as a single entity for the purposes testing whether the Lobian obstacles you are thinking of arise in practice. Now suppose I make basic modifications to the computer, expecting that the joint operation of my brain with the computer will yield improved output. This will not cause me to trip over Lobian obstacles. Why does whatever concern you have about the Lob problem predict that it would not, but also predict that future AIs might stumble over the Lob problem?
Another example. Humans learn different mental habits without stumbling over Lobian obstacles, and they can convince themselves that adopting the new mental habits is an improvement. Some of these are more derivative (“Don’t do X when I have emotion Y”) and others are perhaps more basic (“Try to update through explicit reasoning via Bayes’ Rule in circumstances C”). Why does whatever concern you have about the Lob problem predict that humans can make these modifications without stumbling, but also predict that future AIs might stumble over the Lob problem?
If the answer to both examples is “those are not cases of directly editing one’s low-level algorithms using high-level deliberative processes,” can you explain why your concern about Lobian issues only arises in that type of case? This is not me questioning your definition of “fzoom,” it is my asking why Lobian issues only arise when you are worrying about fzoom.
The first example is related to what I had in mind when I talked about fundamental epistemic standards in a previous comment:
Part of where I’m coming from on the first question is that Lobian issues only seem relevant to me if you want to argue that one set of fundamental epistemic standards is better than another, not for proving that other types of software and hardware alterations (such as building better arms, building faster computers, finding more efficient ways to compress your data, finding more efficient search algorithms, or even finding better mid-level statistical techniques) would result in more expected utility. But I would guess that once you have an agent operating with a minimally decent fundamental epistemic standards, you just can’t prove that altering the agent’s fundamental epistemic standards would result in an improvement. My intuition is that you can only do that when you have an inconsistent agent, and in that situation it’s unclear to me how Lobian issues apply.
Genetic and cultural evolution have built humans individually capable of self-modification without stumbling over Lobian obstacles.
Well, part of this is because modern humans are monstrous in the eyes of many pre-modern humans. To them, the future has been lost because they weren’t using a self-modification procedure that provably preserved their values.
The LW post may address some of your concerns. The idea here is that we need a tiling decision criterion, and the paper isn’t supposed to be an AI design, it’s supposed to get us a little conceptually closer to a tiling decision criterion. If you don’t understand why a tiling decision criterion is a good thing in a self-improving AI which is supposed to have a stable goal system, then I’m not quite sure what issue needs addressing.
Thanks for your courtesy, and again, sorry for not being more specific in my original comment.
Yes, I’m questioning why a self-improving AI which is intended to have a stable goal system needs a tiling decision criterion. In your publication, you wrote
I don’t see why the model of the sequence of agents is a good operationalization. My intuition is that
A self modifying AI would modify itself by modifying its modules one by one.
It would reconstruct a given module whole-cloth, rather than doing so by incrementally changing the module in small steps.
To elaborate, and for concreteness, I’ll comment on
I haven’t read the technical portions of the paper, but my surface impression is that the operationalization in the paper is analogous modifying your arms by successively shaving slivers of tissue off of them, and grafting slivers of tissue onto them, with a view toward making them really long. Another way to go would be to grow the long arms in a lab, chop off your current arms, and then graft the newly created long arms onto yourself. In the context of self-modifying AIs, the latter possibility seems to me to be significantly more likely than the former possibility.
Is my surface impression of the operationalization right? If so, what do you think about the points that I raise in the previous paragraphs?
Jonah, some self-modifications will potentially be large, but others might be smaller. More importantly we don’t want each self-modification to involve wrenching changes like altering the mathematics you believe in, or even worse, your goals. Most of the core idea in this paper is to prevent those kinds of drastic or deleterious changes from being forced by a self-modification.
But it’s also possible that there’ll be many gains from small self-modifications, and it would be nicer not to need a special case for those, and for this it is good to have (in theoretical principle) a highly regular bit of cognition/verification that needs to be done for the change (e.g. for logical agents the proof of a certain theorem) so that small local changes only call for small bits of the verification to be reconsidered.
Another way of looking at it is that we’re trying to have the AI be as free as possible to self-modify while still knowing that it’s sane and stable, and the more overhead is forced or the more small changes are ruled out, the less free it is.
Thanks for engaging.
I’m very sympathetic to this in principle, but don’t see why there would be danger of these things in practice.
Humans constantly perform small self-modifications, and this doesn’t cause serious problems. People’s goals do change, but not drastically, and people who are determined can generally keep their goals pretty close to their original goals. Why do you think that AI would be different?
To ensure that one gets a Friendly AI, it suffices to start with good goal system, and to ensure that the goal system remains pretty stable over time. It’s not necessary that the AI be as free as possible.
You might argue that an limited AI wouldn’t be able to realize as good as a future as one without limitations.
But if this is the concern, why not work to build a limited AI that can itself solve the problems about having a stable goal system under small modifications? Or, if it’s not possible to get a superhuman AI subject to such limitations, why not build a subhuman AI and then work in conjunction with it to build Friendly AI that’s as free as possible?
Many things in AI that look like they ought to be easy have hidden gotchas which only turn up once you start trying to code them, and we can make a start on exposing some of these gotchas by figuring out how to do things using unbounded computing power (albeit this is not a reliable way of exposing all gotchas, especially in the hands of somebody who prefers to hide difficulties, or even someone who makes a mistake about how a mathematical object behaves, but it sure beats leaving everything up to verbal arguments).
Human beings don’t make billions of sequential self-modifications, so they’re not existence proofs that human-quality reasoning is good enough for that.
I’m not sure how to go about convincing you that stable-goals self-modification is not something which can be taken for granted to the point that there is no need to try to make the concepts crisp and lay down mathematical foundations. If this is a widespread reaction beyond yourself then it might not be too hard to get a quote from Peter Norvig or a similar mainstream authority that, “No, actually, you can’t take that sort of thing for granted, and while what MIRI’s doing is incredibly preliminary, just leaving this in a state of verbal argument is probably not a good idea.”
Depending on your math level, reading Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference by Judea Pearl might present you with a crisper idea of why it can be a good idea to formalize certain types of AI problems in general, and it would be a life-enriching experience, but I anticipate that’s more effort than you’d want to put into this exact point.
I don’t disagree (though I think that I’m less confident on this point than you are).
Why do you think that an AI would need to make billions of sequential self-modifications when humans don’t need to?
I agree that it can’t be taken for granted. My questions are about the particular operationalization of a self-modifying AI that you use in your publication. Why do you think that the particular operationalization is going to be related to the sorts of AIs that people might build in practice?
The paper is meant to be interpreted within an agenda of “Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty”; not as “We think this Godelian difficulty will block AI”, nor “This formalism would be good for an actual AI”, nor “A bounded probabilistic self-modifying agent would be like this, only scaled up and with some probabilistic and bounded parts tacked on”. If that’s not what you meant, please clarify.
Ok, that is what I meant, so your comment has helped me better understand your position.
Why do you think that
is cost-effective relative to other options on the table?
For “other options on the table,” I have in mind things such as spreading rationality, building the human capital of people who care about global welfare, increasing the uptake of important information into the scientific community, and building transferable skills and connections for later use.
Personally, I feel like that kind of metawork is very important, but that somebody should also be doing something that isn’t just metawork. If there’s nobody making concrete progress on the actual problem that we’re supposed to be solving, there’s a major risk of the whole thing becoming a lost purpose, as well as of potentially-interested people wandering off to somewhere where they can actually do something that feels more real.
From inside MIRI, I’ve been able to feel this one viscerally as genius-level people come to me and say “Wow, this has really opened my eyes. Where do I get started?” and (until now) I’ve had to reply “Sorry, we haven’t written down our technical research agenda anywhere” and so they go back to machine learning or finance or whatever because no, they aren’t going to learn 10 different fields and become hyper-interdisciplinary philosophers working on important but slippery meta stuff like Bostrom and Shulman.
Yes, that’s a large de-facto part of my reasoning.
I think that in addition to this being true, it is also how it looks from the outside—at least, it’s looked that way to me, and I imagine many others who have been concerned about SI focusing on rationality and fanfiction are coming from a similar perspective. It may be the case that without the object-level benefits, the boost to MIRI’s credibility from being seen to work on the actual technical problem wouldn’t justify the expense of doing so, but whether or not it would be enough to justify the investment by itself, I think it’s a really significant consideration.
[ETA: Of course, in the counterfactual where working on the object problem actually isn’t that important, you could try to explain this to people and maybe that would work. But since I think that it is actually important, I don’t particularly expect that option to be available.]
Yes. I’ve had plenty of conversations with people who were unimpressed with MIRI, in part because the organization looked like it was doing nothing but idle philosophy. (Of course, whether that was the true rejection of the skeptics in question is another matter.)
I understand your position, but believe that your concerns are unwarranted, though I don’t think that this is obvious.
If I gave you a list of people who in fact expressed interest but then, when there were no technical problems for them to work on, “wandered off to somewhere where they can actually do something that feels more real,” would you change your mind? (I may not be able to produce such a list, because I wasn’t writing down people’s names as they wandered away, but I might be able to reconstruct it.)
Sounds like me two years ago, before I committed to finishing my doctorate. Oops.
Well, I’m not sure the “oops” is justified, given that two years ago, I really couldn’t help you contribute to a MIRI technical research program, since it did not exist.
No, the oops is on me for not realizing how shallow “working on something that feels more real” would feel after the novelty of being able to explain what I work on to laypeople wore off.
Ah, I see.
I don’t doubt you: I have different reasons for believing Kaj’s concerns to be unwarranted:
It’s not clear to me that offering people problems in mathematical logic is a good way to get people to work on Friendly AI problems. I think that the mathematical logic work is pretty far removed from the sort of work that will be needed for friendliness.
I believe that people who are interested in AI safety will not forget about AI safety entirely, independently of whether they have good problems to work on now.
I believe that people outside of MIRI will organically begin to work on AI safety without MIRI’s advocacy when AI is temporally closer.
Mathematical logic problems are FAI problems. How are we going to build something self-improving that can reason correctly without having a theory of what “reasoning correctly” (ie logic) even looks like?
Based on what?
I’ll admit I don’t know that I would settle on mathematical logic as an important area of work, but EY being quite smart, working on this for ~10 years, and being able to convince quite a few people who are in a position to judge on this is good confirmation of the plausible idea that work in reflectivity of formal systems is a good place to be.
If you do have some domain knowledge that I don’t have that makes stable reflectivity seem less important and puts you in a position to disagree with an expert (EY), please share.
People can get caught in other things. Maybe without something to work on now, they get deep into something else and build their skills in that and then the switching costs are too high to justify it. Mind you there is a steady stream of smart people, but opportunity costs.
Also, MIRI may be burning reputation capital by postponing actual work such that there may be less interested folks in the future. This could go either way, but it’s a risk that should be accounted for.
(I for one (as a donor and wannabe contributor) appreciate that MIRI is getting these (important-looking) problems available to the rest of us now)
How will they tell? What if it happens too fast? What if the AI designs that are furthest along are incompatible with stable reflection? Hence MIRI working on stategic questions like “how close are we, how much warning can we expect” (Intelligence Explosion Microecon), and “What fundamental architectures are even compatible with friendliness” (this Lob stuff).
See my responses to paper-machine on this thread for (some reasons) why I’m questioning the relevance of mathematical logic.
I don’t see this as any more relevant than Penrose’s views on consciousness, which I recently discussed. Yes, there are multiple people who are convinced, but their may be spurious correlations which are collectively driving their interests. Some that come to mind are
Subject-level impressiveness of Eliezer.
Working on these problems offering people a sense of community.
Being interested in existential risk reduction and not seeing any other good options on the table for reducing existential risk.
Intellectual interestingness of the problems.
Also, I find Penrose more impressive than all of the involved people combined. (This is not intended as a slight – rather, the situation is that Penrose’s accomplishments are amazing.)
The idea isn’t plausible to me, again, for reasons that I give in my responses to paper-machine (among others).
No, my reasons are at a meta-level rather than an object level, just as most members of the Less Wrong community (rightly) believe that Penrose’s views on consciousness are very likely wrong without having read his arguments in detail.
This is possible, but I don’t think that it’s a major concern.
Note this as a potential source of status quo bias.
Place yourself in the shoes of the creators of the early search engines, online book store, and social network websites. If you were in their positions, would you feel justified in concluding “if we don’t do it then no one else will”? If not, why do you think that AI safety will be any different?
I agree that it’s conceivable that it could happen to fast, but I believe that there’s strong evidence that it won’t happen within the next 20 years, and 20 years is a long time for people to become interested in AI safety.
People keep saying that. I don’t understand why “planning fallacy” is not a sufficient reply. See also my view on why we’re still alive.
I agree that my view is not a priori true and requires further argumentation.
Why?
I say something about this here.
Okay; why specifically isn’t mathematical logic the right domain?
EDIT: Or, to put it another way, there’s nothing in the linked comment about mathematical logic.
The question to my mind is why is mathematical logic the right domain? Why not game theory, or solid state physics, or neural networks? I don’t see any reason to privilege mathematical logic – a priori it seems like a non sequitur to me. The only reason that I give some weight to the possibility that it’s relevant is that other people believe that it is.
AI’s do Reasoning. If you can’t see the relevance of logic to reasoning, I can’t help.
Further, do you have some other domain of inquiry that has higher expected return? I’ve seen a lot of stated meta-level skepticism, but no strong arguments either on the meta level (why should MIRI be as uncertain as you) or the object level (are there arguments against studying logic, or arguments for doing something else).
Now I imagine it seems to you that MIRI is privileging the mathematical logic hypothesis, but as above, it looks to me rather obviously relevant such that it would take some evidence against it to put me in your epistemic position.
(Though strictly speaking given strong enough evidence against MIRI’s strategy I would go more towards “I don’t know what’s going on here, everything is confusing” rather than your (I assume) “There’s no good reason one way or the other”)
You seem to be taking a position of normative ignorance (I don’t know and neither can you), in what looks like the face of plenty of information. I would expect rational updating exposed to such information to yield a strong position one way or the other or epistemic panic, not calm (normative!) ignorance.
Note that to take a position of normative uncertainty you have to believe that not only have you seen no evidence, there is no evidence. I’m seeing normative uncertainty and no strong reason to occupy a position of normative uncertainty, so I’m confused.
Humans do reasoning without mathematical logic. I don’t know why anyone would think that you need mathematical logic to do reasoning.
See each part of my comment here as well as my response to Kawoobma here.
I want to hedge because I find some of the people involved in MIRI’s Friendly AI research to be impressive, but putting that aside, I think that the likelihood of the research being useful for AI safety is vanishingly small, at the level of the probability of a random conjunctive statement of similar length being true.
Right. Humans do reasoning, but don’t really understand reasoning. Since ancient times, when people try to understand something they try to formalize it, hence the study of logic.
If we want to build something that can reason we have to understand reasoning or we basically won’t know what we are getting. We can’t just say “humans reason based on some ad-hoc kludgy nonformal system” and then magically extract an AI design from that. We need to build something we can understand or it won’t work, and right now, understanding reasoning in the abstract means logic and it’s extensions.
It’s a double need, though, because not only do we need to understand reasoning, self-improvement means the created thing needs to understand reasoning. Right now we don’t have a formal theory of reasoning that can handle understanding it’s own reasoning without losing power. So that’s we need to solve that.
There is no viable alternate path.
Note that this is different from what you were saying before, and that commenting along the lines of “AI’s do Reasoning. If you can’t see the relevance of logic to reasoning, I can’t help” without further explanation doesn’t adhere to the principle of charity.
I’m very familiar with the argument that you’re making, and have discussed it with dozens of people. The reason why I didn’t respond to the argument before you made it is because I wanted to isolate our core point(s) of disagreement, rather than making presumptions. The same holds for my points below.
This argument has the form “If we want to build something that does X, we have to understand X, or we won’t know what we’re getting.” But this isn’t true in full generality. For example, we can build a window shade without knowing how the window shade blocks light, and still know that we’ll be getting something that blocks light. Why do you think that AI will be different?
Why do you think that it’s at all viable to create an AI based on a formal system? (For the moment putting aside safety considerations.)
As to the rest of your comment — returning to my “Chinese economy” remarks — the Chinese economy is a recursively self-improving system with “goal” of maximizing GDP. It could be that there’s goal drift, and that the Chinese economy starts optimizing for something random. But I think that the Chinese economy does a pretty good job of keeping this “goal” intact, and that it’s been doing a better and better job over time. Why do you think that it’s harder to ensure that an AI keeps its goal intact than it is to ensure that the Chinese economy keeps its “goal” intact.
AI have to come to conclusions about the state of the world, where “world” also includes their own being. Model theory is the field that deals with such things formally.
These could be relevant, but it seems to me that “mind of an AI” is an emergent phenomena of the underlying solid state physics, where “emergent” here means “technically explained by, but intractable to study as such.” Game theory and model theory are intrinsically linked at the hip, and no comment on neural networks.
But the most intelligent beings that we know of are humans, and they don’t use mathematical logic.
Did humans have another choice in inventing the integers? (No. The theory of integers has only one model, up to isomorphism and cardinality.) In general, the ontology a mind creates is still under the aegis of mathematical logic, even if that mind didn’t use mathematical logic to invent it.
Sure, but that’s only one perspective. You can say that it’s under the aegis of particle physics, or chemistry, or neurobiology, or evolutionary psychology, or other things that I’m not thinking of. Why single out mathematical logic.
Going back to humans, getting an explanation of minds out of any of these areas requires computational resources that don’t currently exist. (In the case of particle physics, one might rather say “cannot exist.”)
Because we can prove theorems that will apply to whatever ontology AIs end up dreaming up. Unreasonable effectiveness of mathematics, and all that. But now I’m just repeating myself.
I’m puzzled by your remark. It sounds like a fully general argument. One could equally well say that one should use mathematical logic to build a successful marriage, or fly an airplane, or create a political speech. Would you say this? If not, why do you think that studying mathematical logic is the best way to approach AI safety in particular?
No, a fully general argument is something like “well, that’s just one perspective.” Mathematical logic will not tell you anything about marriage, other than the fact that it is an relation of variable arity (being kind to the polyamorists for the moment).
I have no idea why a reasonable person would say any of these things.
I’d call it the best currently believed way with a chance of developing something actionable without probably requiring more computational power than a matryoshka brain. That’s because it’s the formal study of models and theories in general. Unless you’re willing to argue that AIs will have neither cognitive feature? That’s kind of rhetorical, though—I’m growing tired.
Given that the current Lob paper is non-constructive (invoking the axiom of choice) and hence is about as uncomputable as possible, I don’t understand why you think mathematical logic will help with computational concerns.
The paper on probabilistic reflection in logic is non-constructive, but that’s only sec. 4.3 of the Lob paper. Nothing non-constructive about T-n or TK.
I believe one of the goals this particular avenue of research is to make this result constructive. Also, He was talking about the study of mathematical logic in general not just this paper.
I have little patience for people who believe invoking the axiom of choice in a proof makes the resulting theorem useless.
That was rather rude. I certainly don’t claim that proofs involving choice are useless, merely that they don’t address the particular criterion of computational feasibility.
What do you mean by “something actionable” ?
That sounds like a very long conversation if we’re supposed to be giving quantitative estimates on everything. The qualitative version is just that this sort of thing can take a long time, may not parallelize easily, and can potentially be partially factored out to academia, and so it is wise to start work on it as soon as you’ve got enough revenue to support even a small team, so long as you can continue to scale your funding while that’s happening.
This reply takes for granted that all astronomical benefits bottleneck through a self-improving AI at some point.
Thanks for clarifying your position.
My understanding based on what you say is that the research in your paper is intended to spearhead a field of research, rather than to create something that will be directly used for friendliness in the first AI. Is this right?
If so, our differences are about the sociology of the scientific, technological and political infrastructure rather than about object level considerations having to do with AI.
Sounds about right. You might mean a different thing from “spearhead a field of research” than I do, my phrasing would’ve been “Start working on the goddamned problem.”
From your other comments I suspect that you have a rather different visualization of object-level considerations to do with AI and this is relevant to your disagreement.
Ok. I think that MIRI could communicate more clearly by highlighting this. My previous understanding had been that MIRI staff think that by default, one should expect to need to solve the Lob problem in order to build a Friendly AI. Is there anything in the public domain that would have suggested otherwise to me? If not, I’d suggest writing this up and highlighting it.
AFAIK, the position is still “need to ‘solve’ Lob to get FAI”, where ‘solve’ means find a way to build something that doesn’t have that problem, given that all the obvious formalisms do have such problems. Did EY suggest otherwise?
See my response to EY here.
By default, if you can build a Friendly AI you can solve the Lob problem. That working on the Lob Problem gets you closer to being able to build FAI is neither obvious nor certain, but everything has to start somewhere...
EDIT: Moved the rest of this reply to a new top-level comment because it seemed important and I didn’t want it buried.
http://lesswrong.com/lw/hmt/tiling_agents_for_selfmodifying_ai_opfai_2/#943i
For readers who want to read more about this point, see FAI Research as Effective Altruism.
BTW, I spent a large fraction of the first few months of 2013 weighing FAI research vs. other options before arriving at MIRI’s 2013 strategy (which focuses heavily on FAI research). So it’s not as though I think FAI research is obviously the superior path, and it’s also not as though we haven’t thought through all these different options, and gotten feedback from dozens of people about those options, and so on.
Also note that MIRI did, in fact, decide to focus on (1) spreading rationality, and (2) building a community of people who care about rationality, the far future, and x-risk, before turning its head to FAI research: see (in chronological order) the Singularity Summit, Less Wrong and CFAR.
But the question of which interventions are most cost effective (given astronomical waste) is a huge and difficult topic, one that will require thousands of hours to examine properly. Building on Beckstead and Bostrom, I’ve tried to begin that examination here. Before jumping over to that topic, I wonder: do you now largely accept the case Eliezer made for this latest paper as an important first step on an important sub-problem of the Friendly AI problem? And if not, why not?
My comments were addressed at Eliezer’s paper specifically, rather than MIRI’s general strategy, or your own views.
Sure – what I’m thinking about is cost-effectiveness at the margin.
Based on Eliezer’s recent comments, my impression is that Eliezer is not making such a case, and is rather making a case for the paper being of sociological/motivational value. Is your understanding different?
No, that’s not what I’ve been saying at all.
I’m sorry if this seems rude in some sense, but I need to inquire after your domain knowledge at this point. What is your level of mathematical literacy and do you have any previous acquaintance with AI problems? It may be that, if we’re to proceed on this disagreement, MIRI should try to get an eminent authority in the field to briefly confirm basic, widespread, and correct ideas about the relevance of doing math to AI, rather than us trying to convince you of that via object-level arguments that might not be making any sense to you.
By ‘the relevance of math to AI’ I don’t mean mathematical logic, I mean the relevance of trying to reduce an intuitive concept to a crisp form. In this case, like it says in the paper and like it says in the LW post, FOL is being used not because it’s an appropriate representational fit to the environment… though as I write this, I realize that may sound like random jargon on your end… but because FOL has a lot of standard machinery for self-reflection of which we could then take advantage, like the notion of Godel numbering or ZF proving that every model entails every tautology… which probably doesn’t mean anything to you either. But then I’m not sure how to proceed; if something can’t be settled by object-level arguments then we probably have to find an authority trusted by you, who knows about the (straightforward, common) idea of ‘crispness is relevant to AI’ and can quickly skim the paper and confirm ‘this work crispifies something about self-modification that wasn’t as crisp before’ and testify that to you. This sounds like a fair bit of work, but I expect we’ll be trying to get some large names to skim the paper anyway, albeit possibly not the Early Draft for that.
Quick Googling suggest someone named “Jonah Sinick” is a mathematician in number theory. It appears to be the same person.
I really wish Jonah had mentioned that some number of comments ago, there’s a lot of arguments I don’t even try to use unless I know I’m talking to a mathematical literati.
It’s mentioned explicitly at the beginning of his post Mathematicians and the Prevention of Recessions, strongly implied in The Paucity of Elites Online, and the website listed under his username and karma score is http://www.mathisbeauty.org.
Ok, I look forward to better understanding :-)
I have a PhD in pure math, I know the basic theory of computation and of computational complexity, but I don’t have deep knowledge of these domains, and I have no acquaintance with AI problems.
Yes, this could be what’s most efficient. But my sense is that our disagreement is at a non-technical level rather than at a technical level.
My interpretation of
was that you were asserting only very weak confidence in the relevance the paper to AI safety, and that you were saying “Our purpose in writing this was to do something that could conceivably have something to do with AI safety, so that people take notice and start doing more work on AI safety.” Thinking it over, I realize that you might have meant “We believe that this paper is an important first step on a technical level. Can you clarify here?
If the latter interpretation is right, I’d recur to my question about why the operationalization is a good one, which I feel that you still haven’t addressed, and which I see as crucial.
...
Do you not see that what Luke wrote was a direct response to your question?
There are really two parts to the justification for working on the this paper: 1) Direct FAI research is a good thing to do now. 2) This is a good problem to work on within FAI research. Luke’s comment gives context explaining why MIRI is focusing on direct FAI research, in support of 1. And it’s clear from what you list as other options that you weren’t asking about 2.
It sounds like what you want is for this problem to be compared on its own to every other possible intervention. In theory that would be the rational thing to do to ensure you were always doing the most cost-effective work on the margin. But that only makes sense if it’s computationally practical to do that evaluation at every step.
What MIRI has chosen to do instead is to invest some time up front coming up with a strategic plan, and then follow through on that. This seem entirely reasonable to me.
If the probability is too small, then it isn’t worth it. The activities that I mention plausibly reduce astronomical waste to a nontrivial degree. Arguing that you can do better than them requires an argument that establishes the expected impact of MIRI Friendly AI research on AI safety above a nontrivial threshold.
Which question?
Sure, I acknowledge this.
I don’t think that it’s computationally intractable to come up with better alternatives. Indeed, I think that there are a number of concrete alternatives that are better.
I wasn’t disputing this. I was questioning the relevance of MIRI’s current research to AI safety, not saying that MIRI’s decision process is unreasonable.
The one I quoted: “Why do you think that … is cost-effective relative to other options on the table?”
Yes, you have a valid question about whether this Lob problem is relevant to AI safety.
What I found frustrating as a reader was that you asked why Eliezer was focusing on this problem as opposed to other options such as spreading rationality, building human capital, etc. Then when Luke responded with an explanation that MIRI had chosen to focus on FAI research, rather than those other types of work, you say, no I’m not asking about MIRI’s strategy or Luke’s views, I’m asking about this paper. But the reason Eliezer is working on this paper is because of MIRI’s strategy!
So that just struck me as sort of rude and/or missing the point of what Luke was trying to tell you. My apologies if I’ve been unnecessarily uncharitable in interpreting your comments.
I read Luke’s comment differently, based on the preliminary “BTW.” My interpretation was that his purpose in making thecomment was to give a tangentially related contextual remark rather than to answer my question. (I wasn’t at all bothered by this – I’m just explaining why I didn’t respond to it as if it were intended to address my question.)
Ah, thanks for the clarification.
The way I’m using these words, my “this latest paper as an important first step on an important sub-problem of the Friendly AI problem” is equivalent to Eliezer’s “begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty.”
Ok. I disagree that the paper is an important first step.
Because Eliezer is making an appeal based on psychological and sociological considerations, spelling out my reasoning requires discussion of what sorts of efforts are likely to impact the scientific community, and whether one can expect such research to occur by default. Discussing these requires discussion of psychology, sociology and economics, partly as related to whether the world’s elites will navigate the creation of AI just fine.
I’ve described a little bit of my reasoning, and will be elaborating on it in detail in future posts.
I look forward to it! Our models of how the scientific community works may be substantially different. To consider just one particularly relevant example, consider what the field of machine ethics looks like without the Yudkowskian line.
I agree that Eliezer has substantially altered the field of machine ethics. My view here is very much contingent on the belief that elites will navigate the creation of AI just fine, which, if true, is highly nonobvious.
Other options on the table are not mutually exclusive. There is a lot of wealth and intellectual brain power in the world, and a lot of things to work on. We can’t and shouldn’t all work on one most important problem. We can’t all work on the thousand most important problems. We can’t even agree on what those problems are.
I suspect Eliezer has a comparative advantage in working on this type of AI research, and he’s interested in it, so it makes sense for him to work on this. It especially makes sense to the extent that this is an area no one else is addressing. We’re only talking about an expenditure of several careers and a few million dollars. Compared to the world economy, or even compared to the non-profit sector, this is a drop in the bucket.
Now if instead Eliezer was the 10,000th smart person working on string theory, or if there was an Apollo-style government-funded initiative to develop an FAI by 2019, then my estimate of the comparative advantage of MIRI would shift. But given the facts as they are, MIRI seems like a plausible use of the limited resources it consumes.
If Eliezer feels that this is his comparative advantage then it’s fine for him to work on this sort of research — I’m not advocating that such research be stopped. My own impression is that Eliezer has comparative advantage in spreading rationality and that he could have a bigger impact by focusing on doing so.
I’m not arguing that such research shouldn’t be funded. The human capital question is genuinely more dicey, insofar as I think that Eliezer has contributed substantial value through his work on spreading rationality, and my best guess is that the opportunity cost of not doing more is large.
For starters, humans aren’t able to make changes as easily as an AI can. We don’t have direct access to our source code that we can change effortlessly, any change we make costs either time, money, or both.
That doesn’t address the question. It says that an AI could more easily make self-modifications. It doesn’t suggest that an AI needs to make such self-modifications. Human intelligence is an existence proof that human-level intelligence does not require “billions of sequential self-modifications”. Whether greater than human intelligence requires it, in fact whether greater than human intelligence is even possible, is still an open question.
So I reiterate, “Why do you think that an AI would need to make billions of sequential self-modifications when humans don’t need to?”
Human intelligence required billions of sequential modifications (though not selfmodifications). An AI in general would not need self-modifications, but for a AGI it seems that it would be necessary. I don’t doubt a formal reasoning for the latter statement has been written by someone smarter than me before, but a very informal argument would be something like this:
If an AGI doesn’t need to self-modify, then that AGI is already perfect (or close enough that it couldn’t possibly matter). Since practically no software humans ever built was ever perfect in all respects, that seems exceedingly unlikely. Therefore, the first AGI would (very likely) need to be modified. Of course, at the begining it might be modified by humans (thus, not selfmodified), but the point of building AGI is to make it smarter than us. Thus, once it is smarter than us by a certain amount, it wouldn’t make sense for us (stupider intellects) to improve it (smarter intellect). Thus, it would need to self-modify, and do it a lot, unless by some ridiculously fortuitous accident of math (a) human intelligence is very close to the ideal, or (b) human intelligence will build something very close to the ideal on the first try.
It would be nice if those modifications would be things that are good for us, even if we can’t understand them.
″...need to make billions of sequential self-modifications when humans don’t need to” to do what? Exist, maximize utility, complete an assignment, fulfill a desire...? Some of those might be better termed as “wants” than “needs” but that info is just as important in predicting behavior.
FWIW, Jonah has a PhD in math and has probably read Pearl or a similar graphical models book.
(Not directly relevant to the conversation, but just trying to lower your probability estimate that Jonah’s objections are naieve.)
I don’t see it is a decisive point, one of “many weak arguments,” but I think the analogy with human self-modification is relevant. I would like to see more detailed discussion of the issue.
Aspects of this that seem relevant to me:
Genetic and cultural modifications to human thinking patterns have been extremely numerous. If you take humanity as a whole as an entity doing self-modification on itself, there have been an extremely large number of successful self-modifications.
Genetic and cultural evolution have built humans individually capable of self-modification without stumbling over Lobian obstacles. Evolution and culture likely used relatively simple and easy search processes to do this, rather than ones that rely on very sophisticated mathematical insights. Analogously, one might expect that people will develop AGI in a way that overcomes these problems as well.
Self-modification is to be interpreted to include ‘directly editing one’s own low-level algorithms using high-level deliberative process’ but not include ‘changing one’s diet to change one’s thought processes’. If you are uncomfortable using the word ‘self-modification’ for this please substitute a new word ‘fzoom’ which means only that and consider everything I said about self-modification to be about fzoom.
Humans wouldn’t look at their own source code and say, “Oh dear, a Lobian obstacle”, on this I agree, but this is because humans would look at their own source code and say “What?”. Humans have no idea under what exact circumstances they will believe something, which comes with its own set of problems. The Lobian obstacle shows up when you approach things from the end we can handle, namely weak but well-defined systems which can well-define what they will believe, whereas human mathematicians are stronger than ZF plus large cardinals but we don’t know how they work or what might go wrong or what might change if we started editing neural circuit #12,730,889,136.
As Christiano’s work shows, allowing for tiny finite variances of probability might well dissipate the Lobian obstacle, but that’s the sort of thing you find out by knowing what a Lobian obstacle is.
Very helpful. This seems like something that could lead to a satisfying answer to my question. And don’t worry, I won’t engage in a terminological dispute about “self-modification.”
Can you clarify a bit what you mean by “low-level algorithms”? I’ll give you a couple of examples related to what I’m wondering about.
Suppose I am working with a computer to make predictions about the the weather, and we consider the operations of the computer along with my brain as a single entity for the purposes testing whether the Lobian obstacles you are thinking of arise in practice. Now suppose I make basic modifications to the computer, expecting that the joint operation of my brain with the computer will yield improved output. This will not cause me to trip over Lobian obstacles. Why does whatever concern you have about the Lob problem predict that it would not, but also predict that future AIs might stumble over the Lob problem?
Another example. Humans learn different mental habits without stumbling over Lobian obstacles, and they can convince themselves that adopting the new mental habits is an improvement. Some of these are more derivative (“Don’t do X when I have emotion Y”) and others are perhaps more basic (“Try to update through explicit reasoning via Bayes’ Rule in circumstances C”). Why does whatever concern you have about the Lob problem predict that humans can make these modifications without stumbling, but also predict that future AIs might stumble over the Lob problem?
If the answer to both examples is “those are not cases of directly editing one’s low-level algorithms using high-level deliberative processes,” can you explain why your concern about Lobian issues only arises in that type of case? This is not me questioning your definition of “fzoom,” it is my asking why Lobian issues only arise when you are worrying about fzoom.
The first example is related to what I had in mind when I talked about fundamental epistemic standards in a previous comment:
Well, part of this is because modern humans are monstrous in the eyes of many pre-modern humans. To them, the future has been lost because they weren’t using a self-modification procedure that provably preserved their values.