I feel like I’m not clear on what question you’re asking. Can you give an example of what a good answer would look like, maybe using Xs and Ys since I can hardly ask you to come up with an actual good argument?
There are many possible operationalizations of a self-modifying AI. For example,
One could model a self-improving AI as the Chinese economy (which is in some sense a self-improving powerful optimization process).
One could model a self-improving AI as a chess playing computer program which uses a positional weighting system to choose which moves to make, and which analyzes which weighting heuristics statistically lead to more winning games, in order to improve its positional weighting system.
My reaction to your paper is similar to what my reaction would be to a paper that studies ways to make sure that the Chinese economy doesn’t change in such as way that so that GDP start dropping, or ways to make sure that the chess program doesn’t self-modify to get worse and worse at winning chess games rather than better and better.
It’s conceivable that such a paper would be useful for building a self-improving AI, but a priori I would bet very heavily that activities such as
Working to increase rationality
Spreading concern for global welfare
Building human capital of people who are concerned about global welfare
are more cost-effective activities ways for reducing AI risk than doing such research.
I’m looking for an argument for why the operationalization in the paper is more likely to be relevant to creating safe AI than modeling a self-improving AI as the Chinese economy, or as the aforementioned chess program, or than a dozen other analogous operationalizations that I could make up.
One could model a self-improving AI as the Chinese economy (which is in some sense a self-improving powerful optimization process)...
I’m looking for an argument for why the operationalization in the paper is more likely to be relevant to creating safe AI than modeling a self-improving AI as the Chinese economy, or as the aforementioned chess program, or than a dozen other analogous operationalizations that I could make up.
If somebody wrote a paper showing how an economy could naturally build another economy while being guaranteed to have all prices derived from a constant set of prices on intrinsic goods, even as all prices were set by market mechanisms as the next economy was being built, I’d think, “Hm. Interesting. A completely different angle on self-modification with natural goal preservation.”
I’m surprised at the size of the apparent communications gap around the notion of “How to get started for the first time on a difficult basic question”—surely you can think of mathematical analogies to research areas where it would be significant progress just to throw out an attempted formalization as a base point?
There are all sorts of disclaimers plastered onto the paper about how this only works because logic is monotonic, probabilistic reasoning is not monotonic etcetera. The point is to have a way, any way of just getting started on stable self-modification even though we know the particular exact formalism doesn’t directly work for probabilistic agents. Once you do that you can at least state what it is you can’t do. A paper on a self-replicating economy with a stable set of prices on intrinsic goods would likewise be something you could look at and say, “But this formally can’t do X, because Y” and then you would know more about X and Y then you did previously. Being able to say, “But the verifier-suggester separation won’t work for expected utility agents because probabilistic reasoning is not monotonic” means you’ve gotten substantially further into FAI work than when you’re staring dumbly at the problem.
AIXI was conceptual progress on AGI, and especially public discussion of AGI, because it helped people like me say much more formally all the things that we didn’t like about AIXI, like the anvil problem or AIXI seizing control of its reward channel or AIXI only being able to represent utility functions of sensory data rather than environmental ontologies. Someone coming up with a list of 5 key properties the tiling architecture does not have would be significant progress, and I would like to specifically claim that as an intended, worthwhile, fully-pays-back-the-effort positive consequence if it happens—and this is not me covering all the bases in case of disappointment, the paper was presented in a way consonant with that goal and not in a way consonant with claiming one-trueness.
I don’t understand the model you have of FAI research where this is not the sort of thing that you do at the beginning.
I described my position in another comment. To reiterate and elaborate:
My current best guess is that there are so many unrelated potential models for AI (relative to the information that we currently have) that the probability of FAI work on a single one of them ending up being relevant is tiny. In order to make a compelling argument for the relevance of MIRI’s work on the Lob problem, you have to argue that the model used isn’t only one of, e.g. 10^10 distinct models of AI with similar probability of being realized in practice.
One could argue that the problem is sufficiently important so that one should work on it even if the probability of the work being relevant is tiny. But there are other interventions on the table. You’ve made major contributions by spreading rationality and by creating a community for people who are interested in global welfare to network and collaborate with one another. These things probably substantially reduce astronomical waste (in expectation). In order to argue in favor of MIRI’s FAI research being optimal philanthropy, you have to argue that the probability of the research being relevant is sufficiently great so that its expected value outweighs the expected value of these other activities.
One could argue that if there are in fact so many models for AI then we’re doomed anyway, so we should assume that there aren’t so many models. But rather than trying to work on the models that we think most relevant now, we can wait until it becomes more clear what AGI will look like in practice, and then develop FAI for that type of AI. Whether or not this is feasible is of course related to the question of whether the world’s elites will navigate the creation of AI just fine. I think that there are good reasons to think that the probability of this is pretty high, and that the most leveraged efforts are getting good people in positions of future influence rather than doing FAI research now. Your work on rationality training and community building can help, and already has helped a lot with this.
Neither 2 nor 3 is the sort of argument I would ever make (there’s such a thing as an attempted steelman which by virtue of its obvious weakness doesn’t really help). You already know(?) that I vehemently reject all attempts to multiply tiny probabilities by large utilities in real-life practice, or to claim that the most probable assumption about a background epistemic question leads to a forgone doom and use this to justify its improbable negation. The part at which you lost me is of course part 1.
I still don’t understand what you could be thinking here, and feel like there’s some sort of basic failure to communicate going on. I could guess something along the lines of “Maybe Jonah is imagining that Friendly AI will be built around principles completely different from modern decision theory and any notion of a utility function...” (but really, is something like that one of just 10^10 equivalent candidates?) ”...and more dissimilar to that than logical AI is from decision theory” (that’s a lot of dissimilarity but we already demonstrated conceptual usefulness over a gap that size). Still, that’s the sort of belief someone might end up with if their knowledge of AI was limited to popular books extolling the wonderful chaos of neural networks, but that idea is visibly stupid so my mental model of Anna warns me not to attribute it to you. Or I could guess, “Maybe Jonah is Holden-influenced and thinks that all of this discussion is irrelevant because we’re going to build a Google Maps AGI”, where in point of fact it would be completely relevant, not a tiniest bit less relevant, if we were going to build a planning Oracle. (The experience with Holden does give me pause and make me worry that EA people may think they already know how to build FAI using their personal wonderful idea, just like vast numbers of others think they already know how to build FAI.) But I still can’t think of any acceptable steel version of what you mean, and I say again that it seems to me that you’re saying something that a good mainstream AI person would also be staring quizzically at.
What would be one of the other points in the 10^10-sized space? If it’s something along the lines of “an economic model” then I just explained why if you did something analogous with an economic model it could also be interesting progress, just as AIXI was conceptually important to the history of ideas in the field. I could explain your position by supposing that you think that mathematical ideas never generalize across architectures and so only analyzing the exact correct architecture of a real FAI could be helpful even at the very beginning of work, but this sounds like a visibly stupid position so the model of Anna in my head is warning me not to attribute it to you. On the other hand, some version of, “It is easier to make progress than Jonah thinks because the useful generalization of mathematical ideas does not require you to select correct point X out of 10^10 candidates” seems like it would almost have to be at work here somewhere.
I seriously don’t understand what’s going on in your head here. It sounds like any similar argument should Prove Too Much by showing that no useful work or conceptual progress could have occurred due to AI work in the last 60 years because there would be 10^10 other models for AI. Each newly written computer program is unique but the ideas behind them generalize, the resulting conceptual space can be usefully explored, that’s why we don’t start over with every new computer program. You can do useful things once you’ve collected enough treasure nuggets and your level of ability builds up, it’s not a question of guessing the one true password out of 10^10 tries with nothing being progress until then. This is true on a level of generality which applies across computer science and also to AI and also to FAI and also to decision theory and also to math. Everyone takes this for granted as an obvious background fact of doing research which is why I would expect a good mainstream AI person to also be staring quizzically at your statements here. I do not feel like the defense I’m giving here is in any way different from the defense I’d give of a randomly selected interesting AI paper if you said the same thing about it. “That’s just how research works,” I’d say.
Please amplify point 1 in much greater detail using concrete examples and as little abstraction as possible.
A number of people have recently told me that they have trouble understanding me unless I elaborate further, because I don’t spell out my reasoning in sufficient detail. I think that this is more a matter of the ideas involved being complicated, and there being a lot of inferential distance, than it is lack of effort on my part, but I can see how it would be frustrating to my interlocutors. It seems that I’m subject to the illusion of transparency. I appreciate your patience.
Neither 2 nor 3 is the sort of argument I would ever make (there’s such a thing as an attempted steelman which by virtue of its obvious weakness doesn’t really help). You already know(?) that I vehemently reject all attempts to multiply tiny probabilities by large utilities in real-life practice, or to claim that the most probable assumption about a background epistemic question leads to a forgone doom and use this to justify its improbable negation. The part at which you lost me is of course part 1.
I know that you’ve explicitly disavowed arguments of the type in my points 2 and 3. My reason for bringing them up is to highlight the importance of addressing point 1: to emphasize that it doesn’t suffice to say “the problem is important and we have to get started on it somehow.” I recognize that we have very different implicit assumptions on point 1, and that that’s where the core of the disagreement lies.
I still don’t understand what you could be thinking here, and feel like there’s some sort of basic failure to communicate going on. I could guess something along the lines of “Maybe Jonah is imagining that Friendly AI will be built around principles completely different from modern decision theory and any notion of a utility function...” (but really, is something like that one of just 10^10 equivalent candidates?) ”...and more dissimilar to that than logical AI is from decision theory” (that’s a lot of dissimilarity but we already demonstrated conceptual usefulness over a gap that size). Still, that’s the sort of belief someone might end up with if their knowledge of AI was limited to popular books extolling the wonderful chaos of neural networks, but that idea is visibly stupid so my mental model of Anna warns me not to attribute it to you. Or I could guess, “Maybe Jonah is Holden-influenced and thinks that all of this discussion is irrelevant because we’re going to build a Google Maps AGI”, where in point of fact it would be completely relevant, not a tiniest bit less relevant, if we were going to build a planning Oracle. (The experience with Holden does give me pause and make me worry that EA people may think they already know how to build FAI using their personal wonderful idea, just like vast numbers of others think they already know how to build FAI.) But I still can’t think of any acceptable steel version of what you mean, and I say again that it seems to me that you’re saying something that a good mainstream AI person would also be staring quizzically at.
There’s essentially only one existing example of an entity with general intelligence: a human. I think that our prior should be that the first AGI will have internal structure analogous to that of a human. Here I’m not suggesting that an AGI will have human values by default: I’m totally on board with your points about the dangers of anthropomorphization in that context. Rather, what I mean is that I envisage the first AGI as having many interacting specialized modules, rather than a mathematically defined utility function.
There are serious dangers of such an entity having values that are orthogonal to humans, and serious dangers of value drift. (Your elegant article Why does power corrupt?, has some relevance to the latter point.) But it seems to me that the measures that one would want to take to prevent humans’ goals changing seem completely different from the sorts of measures that might emerge from MIRI’s FAI research.
I’ll also highlight a comment of Nick Beckstead, which you’ve already seen and responded to. I didn’t understand your response.
I should clarify that I don’t have high confidence that the first AGI will develop along these lines. But it’s my best guess, and it seems much more plausible to me than models of the type in your paper.
It sounds like any similar argument should Prove Too Much by showing that no useful work or conceptual progress could have occurred due to AI work in the last 60 years because there would be 10^10 other models for AI.
The difference that I perceive between the two scenarios is the nature of the feedback loops in each case.
When one is chipping away at a problem incrementally, one has the capacity to experiment and use the feedback generated from experimentation to help one limit the search space. Based on what I know about the history of science, general relativity is one of the only successful theories that was created without lots of empirical investigation.
The engineers who designed the first bridges had trillions of combinations of design features and materials to consider a priori, the vast majority of which wouldn’t work. But an empirical discovery like “material X is too weak to work within any design” greatly limits the search space, because you don’t have to think further about any of the combinations involving material X. Similarly if one makes a discovery of the type “material Y is so strong that it’ll work with any design.” By making a series of such discoveries, one can hone in on a few promising candidates.
This is how I predict that the development of AGI will go. I think that the search space is orders of magnitude too large to think about in a useful way without a lot of experimentation, and that a priori we can’t know what the first AGI will look like. I think that once it becomes more clear what the first AGI will look like, it will become much more feasible to make progress on AI safety.
Please amplify point 1 in much greater detail using concrete examples and as little abstraction as possible.
It’ll take me a while to come up with a lot of concrete hypotheticals, but I’ll get back to you on this.
There’s essentially only one existing example of an entity with general intelligence: a human. I think that our prior should be that the first AGI will have internal structure analogous to that of a human. Here I’m not suggesting that an AGI will have human values by default: I’m totally on board with your points about the dangers of anthropomorphization in that context. Rather, what I mean is that I envisage the first AGI as having many interacting specialized modules
Okay. This sounds like you’re trying to make up your own FAI theory in much the same fashion as Holden (and it’s different from Holden’s, of course). Um, what I’d like to do at this point is take out a big Hammer of Authority and tell you to read “Artificial Intelligence: A Modern Approach” so your mind would have some better grist to feed on as to where AI is and what it’s all about. If I can’t do that… I’m not really sure where I could take this conversation. I don’t have the time to personally guide you to understanding of modern AI starting from that kind of starting point. If there’s somebody else you’d trust to tell you about AI, with more domain expertise, I could chat with them and then they could verify things to you. I just don’t know where to take it from here.
On the object level I will quickly remark that some of the first attempts at heavier-than-air flying-machines had feathers and beaks and they did not work very well, that ‘interacting specialized modules’ is Selling Nonapples, that there is an old discussion in cognitive science about the degree of domain specificity in human intelligence, and that the idea that ‘humans are the only example we have’ is generally sterile, for reasons I’ve already written about but I can’t remember the links offhand, hopefully someone else does. It might be in Levels of Organization in General Intelligence, I generally consider that pretty obsolete but it might be targeted to your current level.
Okay. This sounds like you’re trying to make up your own FAI theory in much the same fashion as Holden (and it’s different from Holden’s, of course).
Either of my best guess or Holden’s best guess could be right, and so could lots of other ideas that we haven’t thought of. My proposed conceptual framework should be viewed as one of many weak arguments.
The higher level point that I was trying to make is that [the conceptual framework implicit in view that the MIRI’s current FAI research has a non-negligible chance of being relevant to AI safety] seems highly conjunctive. I don’t mean this rhetorically at all – I genuinely don’t understand why you think that we can make progress given how great the unknown unknowns are. You may be right, but justification of your view requires further argumentation.
Um, what I’d like to do at this point is take out a big Hammer of Authority and tell you to read “Artificial Intelligence: A Modern Approach” so your mind would have some better grist to feed on as to where AI is and what it’s all about. If I can’t do that… I’m not really sure where I could take this conversation. I don’t have the time to personally guide you to understanding of modern AI starting from that kind of starting point. If there’s somebody else you’d trust to tell you about AI, with more domain expertise, I could chat with them and then they could verify things to you. I just don’t know where to take it from here.
A more diplomatic way of framing this would be something like:
“The book Artificial Intelligence: A Modern Approach has a discussion of current approaches to artificial intelligence. Are you familiar with the ideas therein? If not, I’d suggest that you take a look”
Putting that aside, based on conversations with a number of impressive people in machine learning, etc. who I know, my impression is that at the moment, there aren’t strong contenders for research programs that could plausibly lead to AGI. I largely accept Luke’s argument in his blog post on AI timelines, but this is based on the view that the speed of research is going to increase a lot over the coming years, rather than on the belief that any existing research programs have a reasonable chance of succeeding.
I’d be very interested in hearing about existing research programs that have a reasonable chance of succeeding.
I genuinely don’t understand why you think that we can make progress given how great the unknown unknowns are.
Is it your view that no progress has occurred in AI generally for the last sixty years?
I’d be very interested in hearing about existing research programs that have a reasonable chance of succeeding.
The field as a whole has been making perfectly good progress AFAICT. We know a bleepton more about cognition than we did in 1955 and are much less confused by many things. Has someone been giving you an impression otherwise and if so, what field were they in?
Is it your view that no progress has occurred in AI generally for the last sixty years?
No, it’s clear that there have been many advances, for example in chess playing programs, auto-complete search technology, automated translation, driverless cars, and speech recognition.
But my impression is that this work has only made a small dent in the problem of general artificial intelligence.
The field as a whole has been making perfectly good progress AFAICT. We know a bleepton more about cognition than we did in 1955 and are much less confused by many things. Has someone been giving you an impression otherwise and if so, what field were they in?
Three graduate students in machine learning at distinct elite universities.
Scott Aaronson. Even though he works in theoretical computer science rather than AI, he’s in close proximity with many colleagues who work on artificial intelligence at MIT, and so I give a fair amount of weight to his opinion.
Also, the fraction of scientists who I know who believe that there’s a promising AGI research agenda on the table is very small, mostly consisting of people around MIRI. Few of the scientists who I know have subject matter expertise, but if there was a promising AGI research agenda on the table, I would expect news of it to have percolated to at least some of the people in question.
I think I may have been one of those three graduate students, so just to clarify, my view is:
Zero progress being made seems too strong a claim, but I would say that most machine learning research is neither relevant to, nor trying to be relevant to, AGI. I think that there is no real disagreement on this empirical point (at least, from talking to both Jonah and Eliezer in person, I don’t get the impression that I disagree with either of you on this particular point).
The model for AGI that MIRI uses seems mostly reasonable, except for the “self-modification” part, which seems to be a bit too much separated out from everything else (since pretty much any form of learning is a type of self-modification—current AI algorithms are self-modifying all the time!).
On this vein, I’m skeptical of both the need or feasibility of an AI providing an actual proof of safety of self-modification. I also think that using mathematical logic somewhat clouds the issues here, and that most of the issues that MIRI is currently working on are prerequisites for any sort of AI, not just friendly AI. I expect them to be solved as a side-effect of what I see as more fundamental outstanding problems.
However, I don’t have reasons to be highly confident in these intuitions, and as a general rule of thumb, having different researchers with different intuitions pursue their respective programs is a good way to make progress, so I think it’s reasonable for MIRI to do what it’s doing (note that this is different from the claim that MIRI’s research is the most important thing and is crucial to the survival of humanity, which I don’t think anyone at MIRI believes, but I’m clarifying for the benefit of onlookers).
Zero progress being made seems too strong a claim, but I would say that most machine learning research is neither relevant to, nor trying to be relevant to, AGI.
Agreed, the typical machine learning paper is not AGI progress—a tiny fraction of such papers being AGI progress suffices.
On this vein, I’m skeptical of both the need or feasibility of an AI providing an actual proof of safety of self-modification.
I want to note that the general idea being investigated is that you can have a billion successive self-modifications with no significant statistically independent chance of critical failure. Doing proofs from axioms in which case the theorems are, not perfectly strong, but at least as strong as the axioms with conditionally independent failure probabilities not significantly lowering the conclusion strength below this as they stack, is an obvious entry point into this kind of lasting guarantee. It also suggests to me that even if the actual solution doesn’t use theorems proved and adapted to the AI’s self-modification, it may have logic-like properties. The idea here may be more general than it looks at a first glance.
Agreed, the typical machine learning paper is not AGI progress—a tiny fraction of such papers being AGI progress suffices.
Can you name some papers that you think constitute AGI progress? (Not a rhetorical question.)
I want to note that the general idea being investigated is that you can have a billion successive self-modifications with no significant statistically independent chance of critical failure. Doing proofs from axioms in which case the theorems are, not perfectly strong, but at least as strong as the axioms with conditionally independent failure probabilities not significantly lowering the conclusion strength below this as they stack, is an obvious entry point into this kind of lasting guarantee.
I’m not sure if I parse this correctly, and may be responding to something that you don’t intend to claim, but I want to remark that if the probabilities of critical failure at each stage are
0.01, 0.001, 0.0001, 0.00001, etc.
then total probability of critical failure is less than 2%. You don’t need the probability of failure at each stage to be infinitesimal, you only need the probabilities of failure to drop off fast enough.
How would they drop off if they’re “statistically independent”? In principle this could happen, given a wide separation in time, if humanity or lesser AIs somehow solve a host of problems for the self-modifier. But both the amount of help from outside and the time-frame seem implausible to me, for somewhat different reasons. (And the idea that we could know both of them well enough to have those subjective probabilities seems absurd.)
The Chinese economy was stagnant for a long time, but is now much closer to continually increasing GDP (on average) with high probability, and I expect that “goal” of increasing GDP will become progressively more stable over time.
The situation may be similar with AI, and I would expect it to be by default.
I want to note that the general idea being investigated is that you can have a billion successive self-modifications with no significant statistically independent chance of critical failure. Doing proofs from axioms in which case the theorems are, not perfectly strong, but at least as strong as the axioms with conditionally independent failure probabilities not significantly lowering the conclusion strength below this as they stack, is an obvious entry point into this kind of lasting guarantee. It also suggests to me that even if the actual solution doesn’t use theorems proved and adapted to the AI’s self-modification, it may have logic-like properties. The idea here may be more general than it looks at a first glance.
I’m aware of this argument, but I think there are other ways to get this. The first tool I would reach for would be a martingale (or more generally a supermartingale), which is a statistical process that somehow manages to correlate all of its failures with each other (basically by ensuring that any step towards failure is counterbalanced in probability by a step away from failure). This can yield bounds on failure probabiity that hold for extremely long time horizons, even if there is non-trivial stochasticity at every step.
Note that while martingales are the way that I would intuitively approach this issue, I’m trying to make the broader argument that there are ways other than mathematical logic to get what you are after (with martingales being one such example).
The first tool I would reach for would be a martingale (or more generally a supermartingale), which is a statistical process that somehow manages to correlate all of its failures with each other (basically by ensuring that any step towards failure is counterbalanced in probability by a step away from failure).
Please expand on this, because I’m having trouble understanding your idea as written. A martingale is defined as “a sequence of random variables (i.e., a stochastic process) for which, at a particular time in the realized sequence, the expectation of the next value in the sequence is equal to the present observed value even given knowledge of all prior observed values at a current time”, but what random variable do you have in mind here?
I can make some sense of this, but I’m not sure whether it is what Jacob has in mind because it doesn’t seem to help.
Imagine that you’re the leader of an intergalactic civilization that wants to survive and protect itself against external threats forever. (I’m spinning a fancy tale for illustration; I’ll make the link to the actual AI problem later, bear with me.) Your abilities are limited by the amount of resources in the universe you control. The variable X(t) says what fraction you control at time t; it takes values between 0 (none) or 1 (everything). If X(t) ever falls to 0, game’s over and it will stay at 0 forever.
Suppose you find a strategy such that X(t) is a supermartingale; that is, E[X(t’) | I_t] >= X_t for all t’ > t, where I_t is your information at time t. [ETA: In discrete time, this is equivalent to E[X(t+1) | I_t] >= X_t, i.e., in expectation you have at least as many resources in the next round as you have in this round.] Now clearly we have E[X(t’) | I_t] ⇐ P[X(t’) > 0 | I_t], and therefore P[X(t’) > 0 | I_t] >= X_t. Therefore, given your information at time t, the probability that your resources will never fall to zero is at least X_t (this follows from the above by using the assumption that if they ever fall to 0, then they stay at 0). So if you start with a large share of the resources, there’s a large probability that you’ll never run out.
The link to AI is that we replace “share of resources” by some “quality” parameter describing the AI. I don’t know whether Jacob has ideas what such parameter might be, but it would be such that there is a catastrophe iff it falls to 0.
The problem with all of this is that it sounds mostly like a restatement of “we don’t want there to be an independent failure probability on each step; we want there to be a positive probability that there is never a failure”. The martingale condition is a bit more specific than that, but it doesn’t tell us how to make that happen. So, unless I’m completely mistaken about what Jacob intended to say (possible), it seems more like a different description of the problem rather than a solution to the problem...
Thank you Benja, for the very nice explanation! (As a technical point, what you are describing is a “submartingale”, a supermartingale has the inequality going in the opposite direction and then of course you have to make 1 = failure and 0 = success instead of the other way around).
Martingales may in some sense “just” be a rephrasing of the problem, but I think that’s quite important! In particular, they implicitly come with a framework of thought that suggests possible approaches—for instance, one could imagine a criterion for action in which risks must always be balanced by the expectation of acquiring new information that will decrease future risks—we can then imagine writing down a potential function encapsulating both risk to humanity and information about the world / humanity’s desires, and have as a criterion of action that this potential function never increase in expectation (relative to, e.g., some subjective probability distribution that we have reason to believe is well-calibrated).
I second Wei’s question. I can imagine doing logical proofs about how your successor’s algorithms operate to try to maximize a utility function relative to a lawfully updated epistemic state, and would consider my current struggle to be how to expand this to a notion of a lawfully approximately updated epistemic state. If you say ‘martingale’ I have no idea where to enter the problem at all, or where the base statistical guarantees that form part of the martingale would come from. It can’t be statistical testing unless the problem is i.i.d. because otherwise every context shift breaks the guarantee.
I would say that most machine learning research is neither relevant to, nor trying to be relevant to, AGI
It seems to me like relatively narrow progress on learning is likely to be relevant to AGI. It does seem plausible that e.g. machine learning research is not too much more relevant to AGI than progress in optimization or in learning theory or in type theory or perhaps a dozen other fields, but it doesn’t seem very plausible that it isn’t taking us closer to AGI in expectation.
except for the “self-modification” part, which seems to be a bit too much separated out from everything else (since pretty much any form of learning is a type of self-modification—current AI algorithms are self-modifying all the time!)
Yes, reflective reasoning seems to be necessary to reason about the process of learning and the process of reflection, amongst other things. I don’t think any of the work that has been done applies uniquely to explicit self-modification vs. more ordinary problems with reflection (e.g. I think the notion of “truth” is useful if you want to think about thinking, and believing that your own behavior is sane is useful if you want to think about survival as an instrumental value).
most of the issues that MIRI is currently working on are prerequisites for any sort of AI, not just friendly AI
This seems quite likely (or at least the weaker claim, that either these results are necessary for any AI or they are useless for any AI, seems very likely). But of course this is not enough to say that such work isn’t useful for better understanding and coping with AI impacts. If we can be so lucky as to find important ideas well in advance of building the practical tools that make those ideas algorithmically relevant, then we might develop a deeper understanding of what we are getting into and more time to explore the consequences.
In practice, even if this research program worked very well, we would probably be left with at least a few and perhaps a whole heap of interesting theoretical ideas. And we might have few clues as to which will turn out to be most important. But that would still give us some general ideas about what human-level AI might look like, and could help us see the situation more clearly.
I’m skeptical of both the need or feasibility of an AI providing an actual proof of safety of self-modification
Indeed, I would be somewhat surprised if interesting statements get proven often in the normal business of cognition. But this doesn’t mean that mathematical logic and inference won’t play an important role in AI—logical is by far the most expressive language that we are currently aware of, and therefore a natural starting point if we want to say anything formal about cognition (and as far as I can tell this is not at all a fringe view amongst folks in AI).
It seems to me like relatively narrow progress on learning is likely to be relevant to AGI. It does seem plausible that e.g. machine learning research is not too much more relevant to AGI than progress in optimization or in learning theory or in type theory or perhaps a dozen other fields, but it doesn’t seem very plausible that it isn’t taking us closer to AGI in expectation.
I’d be interested in your response to the following, which I wrote in another context. I recognize that I’m far outside of my domain of expertise, and what I write should be read as inquisitive rather than argumentative:
The impression that I’ve gotten is that to date, impressive applications of computers to do tasks that humans do are based around some combination of
Brute force computation
Task specific algorithms generated by humans
In particular, they doesn’t seem at all relevant to mimicking human inference algorithms.
As I said in my point #2 here: I find it very plausible that advances in narrow AI will facilitate the development of AGI by enabling experimentation.
The question that I’m asking is more: “Is it plausible that the first AGI will be based on filling in implementation details of current neural networks research programs, or current statistical inference research programs?”
Something worth highlighting is that researchers in algorithms have repeatedly succeeded in developing algorithms that solve NP-complete problems in polynomial time with very high probability, or that give very good approximations to solutions to problems in polynomial time where it would be NP-complete to get the solutions exactly right. But these algorithms can’t be ported from one NP-complete problem to another while retaining polynomial running time. One has to deal with each algorithmic problem separately.
From what I know, my sense is that one has a similar situation in narrow AI, and that humans (in some vague sense) have a polynomial time algorithm that’s robust across different algorithmic tasks.
I don’t really understand how “task specific algorithms generated by humans” differs from general intelligence. Humans choose a problem, and then design algorithms to solve the problem better. I wouldn’t expect a fundamental change in this situation (though it is possible).
But these algorithms can’t be ported from one NP-complete problem to another while retaining polynomial running time.
I think this is off. A single algorithm currently achieves the best known approximation ratio on all constraint satisfaction problems with local constraints (this includes most of the classical NP-hard approximation problems where the task is “violate as few constraints as possible” rather than “satisfy all constraints, with as high a score as possible”), and is being expanded to cover increasingly broad classes of global constraints. You could say “constraint satisfaction is just another narrow task” but this kind of classification is going to take you all the way up to human intelligence and beyond. Especially if you think ‘statistical inference’ is also a narrow problem, and that good algorithms for planning and inference are more of the same.
I don’t really understand how “task specific algorithms generated by humans” differs from general intelligence. Humans choose a problem, and then design algorithms to solve the problem better. I wouldn’t expect a fundamental change in this situation (though it is possible).
All I’m saying here is that general intelligence can construct algorithms across domains, whereas my impression is that impressive human+ artificial intelligence to date hasn’t been able to construct algorithms across domains.
General artificial intelligence should be able to prove:
and thousands of other such statements. My impression is that current research in AI is analogous to working on proving these things one at a time.
Working on the classification of simple finite groups could indirectly help you prove the Atiyah-Singer Index Theorem on account of leading to the discovery of structures that are relevant, but such work will only make a small dent on the problem of proving the Atiyah-Singer Index Theorem. Creating an algorithm that can prove these things (that’s not over-fitted to the data) is a very different problem from that of proving the theorems individually.
Do you think that the situation with AI is analogous or disanalogous?
A single algorithm currently achieves the best known approximation ratio on all constraint satisfaction problems with local constraints (this includes most of the classical NP-hard approximation problems where the task is “violate as few constraints as possible” rather than “satisfy all constraints, with as high a score as possible”), and is being expanded to cover increasingly broad classes of global constraints.
I’m not sure if I follow. Is the algorithm that you have in mind the conglomeration of all existing algorithms?
If so, it’s entirely unclear how quickly the algorithm is growing relative to the problems that we’re interested in.
I’m not sure if I follow. Is the algorithm that you have in mind the conglomeration of all existing algorithms?
No, there is a single SDP rounding scheme that gets optimal performance on all constraint satisfaction problems (the best we know so far, and the best possible under the unique games conjecture).
I would disagree with the statement that our algorithms are all domain-specific. Often some amount of domain-specific knowledge is needed to design a good algorithm, but it is often quite minimal. For instance, my office-mate is building a parser for interpreting natural language semantics, and has taken zero linguistics classes (but has picked up some amount of linguistics knowledge from talks, etc.). Of course, he’s following in the footsteps of people who do know linguistics, but the point is just that the methods people use tend to be fairly general despite requiring task-specific tuning.
I agree, of course, that there are few systems that work across multiple domains, but I’m not sure that that’s a fundamental issue so much as a symptom of broader issues that surface in this context (such as latent variables and complex features).
Something worth highlighting is that researchers in algorithms have repeatedly succeeded in developing algorithms that solve NP-complete problems in polynomial time with very high probability, or that give very good approximations to solutions to problems in polynomial time where it would be NP-complete to get the solutions exactly right. But these algorithms can’t be ported from one NP-complete problem to another while retaining polynomial running time. One has to deal with each algorithmic problem separately.
You can’t do that? From random things like computer security papers, I was under the impression that you could do just that—convert any NP problem to a SAT instance and toss it at a high-performance commodity SAT solver with all its heuristics and tricks, and get an answer back.
You can’t do that? From random things like computer security papers, I was under the impression that you could do just that—convert any NP problem to a SAT instance and toss it at a high-performance commodity SAT solver with all its heuristics and tricks, and get an answer back.
You can do this. Minor caveat: this works for overall heuristic methods- like “tabu search” or “GRASP”- but many of the actual implementations you would see in the business world are tuned to the structure of the probable solution space. One of the traveling salesman problem solvers I wrote a while back would automatically discover groups of cities and move them around as a single unit- useful when there are noticeable clusters in the space of cities, not useful when there aren’t. Those can lead to dramatic speedups (or final solutions that are dramatically closer to the optimal solution) but I don’t think they translate well across reformulations of the problem.
NP-hard problems vary greatly in their approximability; some, such as the bin packing problem, can be approximated within any factor greater than 1 (such a family of approximation algorithms is often called a polynomial time approximation scheme or PTAS). Others are impossible to approximate within any constant, or even polynomial factor unless P = NP, such as the maximum clique problem.
You can do that. But although such algorithms will produce correct answers to any NP problem when given correct answers to SAT, that does not mean that they will produce approximate answers to any NP problem when given approximate answers to SAT. (In fact, I’m not sure if the concept of an approximate answer makes sense for SAT, although of course you could pick a different NP-complete problem to reduce to.)
Edit: My argument only applies to algorithms that give approximate solutions, not to algorithms that give correct solutions with high probability, and reading your comment again, it looks like you may have been referring to the later. You are correct that if you have a polynomial-time algorithm to solve any NP-complete problem with high probability, then you can get a polynomial-time algorithm to solve any NP problem with high probability. Edit 2: sort of; see discussion below.
You are correct that if you have a polynomial-time algorithm to solve any NP-complete problem with high probability, then you can get a polynomial-time algorithm to solve any NP problem with high probability.
If a problem is NP-complete, then by definition, any NP problem can be solved in polynomial time by an algorithm which is given an oracle that solves the NP-complete problem, which it is allowed to use once. If, in place of the oracle, you substitute a polynomial-time algorithm which solves the problem correctly 90% of the time, the algorithm will still be polynomial-time, and will necessarily run correctly at least 90% of the time.
However, as JoshuaZ points out, this requires that the algorithm solve every instance of the problem with high probability, which is a much stronger condition than just solving a high proportion of instances. In retrospect, my comment was unhelpful, since it is not known whether there are any algorithms than solve every instance of an NP-complete problem with high probability. I don’t know how generalizable the known tricks for solving SAT are (although presumably they are much more generalizable than JoshuaZ’s example).
In retrospect, my comment was unhelpful, since it is not known whether there are any algorithms than solve every instance of an NP-complete problem with high probability.
This is the key. If you had an algorithm that solved every instance of an NP-complete problem in polynomial time with high probability, you could generate a proof of the Riemann hypothesis with high probability! (Provided that the polynomial time algorithm is pretty fast, and that the proof isn’t too long)
It depends on think on what AlexMennen meant by this. If for example there is a single NP complete problem in BPP then it is clear that NP is in BPP. Similar remarks apply to ZPP, and in both cases, almost the entire polynomial hierarchy will collapse. The proofs here are straightforward.
If, however, Alex meant that one is picking random instance of a specific NP complete problem, and that they can be solved deterministically, then Alex’s claim seems wrong. Consider for example this problem: “If an input string of length n starts with exactly floor(n^(1/2)) zeros and then a 1, treat the remainder like it is an input string for 3-SAT. If the string starts with anything else, return instead the parity of the string.” This is an NP-complete problem where we can solve almost all instances with high probability since most instances are really just a silly P problem. But we cannot use this fact to solve another NP complete problem (say normal 3-SAT) with high probability.
in both cases, almost the entire polynomial hierarchy will collapse
Why?
Well, in the easy case of ZPP, ZPP is contained in co-NP, so if NP is contained in ZPP then NP is contained in co-NP, in which case the hierarchy must collapse to the first level.
In the case of BPP, the details are slightly more subtle and requires deeper results. If BPP contains NP, then Adelman’s theorem says that then the entire polynomial hierarchy is contained in BPP. Since BPP is itself contained at finite level of the of the hierarchy, this forces collapse to at least that level.
most of the issues that MIRI is currently working on are prerequisites for any sort of AI, not just friendly AI
This seems quite likely (or at least the weaker claim, that either these results are necessary for any AI or they are useless for any AI, seems very likely).
Point of order: Let A = “these results are necessary for any AI” and B = “they are useless for any AI”. It sounds like you’re weakening from A to (A or B) because you feel the probability of B is large, and therefore the probability of A isn’t all that large in absolute terms. But if much of the probability mass of the weaker claim (A or B) comes from B, then if at all possible, it seems more pragmatically useful to talk about (i) the probability of B and (ii) the probability of A given (not B), instead of talking about the probability of (A or B), since qualitative statements about (i) and (ii) seem to be what’s most relevant for policy. (In particular, even knowing that “the probability of (A or B) is very high” and “the probability of A is not that high”—or even “is low”—doesn’t tell us whether P(A|not B) is high or low.)
My impression from your above comments is that we are mostly in agreement except for how much we respectively like mathematical logic. This probably shouldn’t be surprising given that you are a complexity theorest and I’m a statistician, and perhaps I should learn some more mathematical logic so I can appreciate it better (which I’m currently working on doing).
I of course don’t object to logic in the context of AI, it mainly seems to me that the emphasis on mathematical logic in this particular context is unhelpful, as I don’t see the issues being raised as being fundamental to what is going on with self-modification. I basically expect whatever computationally bounded version of probability we eventually come up with to behave locally rather than globally, which I believe circumvents most of the self-reference issues that pop up (sorry if that is somewhat vague intuition).
Hm. I’m not sure if Scott Aaronson has any weird views on AI in particular, but if he’s basically mainstream-oriented we could potentially ask him to briefly skim the Tiling Agents paper and say if it’s roughly the sort of paper that it’s reasonable for an organization like MIRI to be working on if they want to get some work started on FAI. At the very least if he disagreed I’d expect he’d do so in a way I’d have better luck engaging conversationally, or if not then I’d have two votes for ‘please explore this issue’ rather than one.
I feel again like you’re trying to interpret the paper according to a different purpose from what it has. Like, I suspect that if you described what you thought a promising AGI research agenda was supposed to deliver on what sort of timescale, I’d say, “This paper isn’t supposed to do that.”
No, it’s clear that there have been many advances, for example in chess playing programs, auto-complete search technology, automated translation, driverless cars, and speech recognition.
But my impression is that this work has only made a small dent in the problem of general artificial intelligence.
This part is clearer and I think I may have a better idea of where you’re coming from, i.e., you really do think the entire field of AI hasn’t come any closer to AGI, in which case it’s much less surprising that you don’t think the Tiling Agents paper is the very first paper ever to come closer to AGI. But this sounds like a conversation that someone else could have with you, because it’s not MIRI-specific or FAI-specific. I also feel somewhat at a loss for where to proceed if I can’t say “But just look at the ideas behind Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, that’s obviously important conceptual progress because...” In other words, you see AI doing a bunch of things, we already mostly agree on what these sorts of surface real-world capabilities are, but after checking with some friends you’ve concluded that this doesn’t mean we’re less confused about AGI then we were in 1955. I don’t see how I can realistically address that except by persuading your authorities; I don’t see what kind of conversation we could have about that directly without being able to talk about specific AI things.
Meanwhile, if you specify “I’m not convinced that MIRI’s paper has a good chance of being relevant to FAI, but only for the same reasons I’m not convinced any other AI work done in the last 60 years is relevant to FAI” then this will make it clear to everyone where you’re coming from on this issue.
could an AI improve itself to something that was “as incomprehensibly far beyond humans as Turing machines are beyond finite automata”?
as I wrote in my “Singularity is Far” post, my strong guess (based, essentially, on the Church-Turing Thesis) is that the answer is no. I believe—as David Deutsch also argues in “The Beginning of Infinity”—that human beings are “qualitatively,” if not quantitatively, already at some sort of limit of intellectual expressive power. More precisely, I conjecture that for every AI that can exist in the physical world, there exists a constant k such that a reasonably-intelligent human could understand the AI perfectly well, provided the AI were slowed down by a factor of k. So then the issue is “merely” that k could be something like 10^20.
And later:
I’m not sure how much I agree with Karnofsky’s “tool vs. agent” distinction, but his broader point is very similar to mine: namely, the uncertainties regarding “Friendly AI” are so staggering that it’s impossible to say with confidence whether any “research” we do today would be likelier to increase or decrease the chance of catastrophe (or just be completely irrelevant).
For that reason, I would advise donating to SIAI if, and only if, you find the tangible activities that they actually do today—most notably (as far as I can tell), the Singularity Summit and Eliezer’s always-interesting blog posts about “the art of rationality”—to be something you want to support.
Without further context I see nothing wrong here. Superintelligences are Turing machines, check. You might need a 10^20 slowdown before that becomes relevant, check. It’s possible that the argument proves too much by showing that a well-trained high-speed immortal dog can simulate Mathematica and therefore a dog is ‘intellectually expressive’ enough to understand integral calculus, but I don’t know if that’s what Scott means and principle of charity says I shouldn’t assume that without confirmation.
EDIT: Parent was edited, my reply was to the first part, not the second. The second part sounds like something to talk with Scott about. I really think the “You’re just as likely to get results in the opposite direction” argument is on the priors overstated for most forms of research. Does Scott think that work we do today is just as likely to decrease our understanding of P/NP as increase it? We may be a long way off from proving an answer but that’s not a reason to adopt such a strange prior.
As it happens, I’ve been chatting with Scott about this issue recently, due to some comments he made in his recent quantum Turing machine paper:
the uncomfortable truth is that it’s the Singularitarians who are the scientific conservatives, while those who reject their vision as fantasy are scientific radicals. For at some level, all the Singularitarians are doing is taking conventional thinking about physics and the brain to its logical conclusion. If the brain is a “meat computer,” then given the right technology, why shouldn’t we be able to copy its program from one physical substrate to another? And why couldn’t we then run multiple copies of the program in parallel...?
...Certainly, one could argue that the Singularitarians’ timescales might be wildly off… [Also,] suppose we conclude — as many Singularitarians have — that the greatest problem facing humanity today is how to ensure that, when superhuman AIs are finally built, those AIs will be “friendly” to human concerns. The difficulty is: given our current ignorance about AI, how on earth should we act on that conclusion? Indeed, how could we have any confidence that whatever steps we did take wouldn’t backfire, and increase the probability of an unfriendly AI?
I thought his second objection (“how could we know what to do about it?”) was independent of his first objection (“AI seems farther away than the singularitarians tend to think”), but when I asked him about it, he said his second objection just followed from the first. So given his view that AI is probably centuries away, it seems really hard to know what could possibly help w.r.t. FAI. And if I thought AI was several centuries away, I’d probably have mostly the same view.
I asked Scott: “Do you think you’d hold roughly the same view if you had roughly the probability distribution over year of AI creation as I gave in When Will AI Be Created? Or is this part of your view contingent on AI almost certainly being several centuries away?”
He replied: “No, if my distribution assigned any significant weight to AI in (say) a few decades, then my views about the most pressing tasks today would almost certainly be different.” But I haven’t followed up to get more specifics about how his views would change.
And yes, Scott said he was fine with quoting this conversation in public.
I think I’d be happy with a summary of persistent disagreement where Jonah or Scott said, “I don’t think MIRI’s efforts are valuable because we think that AI in general has made no progress on AGI for the last 60 years / I don’t think MIRI’s efforts are priorities because we don’t think we’ll get AGI for another 2-3 centuries, but aside from that MIRI isn’t doing anything wrong in particular, and it would be an admittedly different story if I thought that AI in general was making progress on AGI / AGI was due in the next 50 years”.
I don’t think MIRI’s efforts are valuable because I think that AI in general has made no progress on AGI for the last 60 years, but aside from that MIRI isn’t doing anything wrong in particular, and it would be an admittedly different story if I thought that AI in general was making progress on AGI.
is pretty close to my position.
I would qualify it by saying:
I’d replace “no progress” with “not enough progress for there to be a known research program with a reasonable chance of success.”
I have high confidence that some of the recent advances in narrow AI will contribute (whether directly or indirectly) to the eventual creation of AGI (contingent on this event occurring), just not necessarily in a foreseeable way.
If I discover that there’s been significantly more progress on AGI than I had thought, then I’ll have to reevaluate my position entirely. I could imagine updating in the directly of MIRI’s FAI work being very high value, or I could imagine continuing to believe that MIRI’s FAI research isn’t a priority, for reasons different from my current ones.
I really think the “You’re just as likely to get results in the opposite direction” argument is on the priors overstated for most forms of research. Does Scott think that work we do today is just as likely to decrease our understanding of P/NP as increase it? We may be a long way off from proving an answer but that’s not a reason to adopt such a strange prior.
I’m doing some work for MIRI looking at the historical track record of predictions of the future and actions taken based on them, and whether such attempts have systematically done as much harm as good.
To this end, among other things, I’ve been reading Nate Silver’s The Signal and the Noise. In Chapter 5, he discusses how attempts to improve earthquake predictions have consistently yielded worse predictive models than the Gutenberg-Richter law. This has slight relevance.
Such examples not withstanding, my current prior is on MIRI’s FAI research having positive expected value. I don’t think that the expected value of the research is zero or negative – only that it’s not competitive with the best of the other interventions on the table.
I really think the “You’re just as likely to get results in the opposite direction” argument is on the priors overstated for most forms of research. Does Scott think that work we do today is just as likely to decrease our understanding of P/NP as increase it?
My own interpretation of Scott’s words here is that it’s unclear whether your research is actually helping in the “get Friendly AI before some idiot creates a powerful Unfriendly one” challenge. Fundamental progress in AI in general could just as easily benefit the fool trying to build a AGI without too much concern for Friendliness, as it could benefit you. Thus, whether fundamental research helps out avoiding the UFAI catastrophy is unclear.
I’m not sure that interpretation works, given that he also wrote:
suppose we conclude — as many Singularitarians have — that the greatest problem facing humanity today is how to ensure that, when superhuman AIs are finally built, those AIs will be “friendly” to human concerns. The difficulty is: given our current ignorance about AI, how on earth should we act on that conclusion? Indeed, how could we have any confidence that whatever steps we did take wouldn’t backfire, and increase the probability of an unfriendly AI?
Since Scott was addressing steps taken to act on the conclusion that friendliness was supremely important, presumably he did not have in mind general AGI research.
Hm. I’m not sure if Scott Aaronson has any weird views on AI in particular, but if he’s basically mainstream-oriented we could potentially ask him to briefly skim the Tiling Agents paper and say if it’s roughly the sort of paper that it’s reasonable for an organization like MIRI to be working on if they want to get some work started on FAI.
Yes, I would welcome his perspective on this.
I feel again like you’re trying to interpret the paper according to a different purpose from what it has. Like, I suspect that if you described what you thought a promising AGI research agenda was supposed to deliver on what sort of timescale, I’d say, “This paper isn’t supposed to do that.”
I think I’ve understood your past comments on this point. My questions are about the implicit assumptions upon which the value of the research rests, rather than about what the research does or doesn’t succeed in arguing.
This part is clearer and I think I may have a better idea of where you’re coming from, i.e., you really do think the entire field of AI hasn’t come any closer to AGI, in which case it’s much less surprising that you don’t think the Tiling Agents paper is the very first paper ever to come closer to AGI. But this sounds like a conversation that someone else could have with you, because it’s not MIRI-specific or FAI-specific.
As I said in earlier comments, the case for the value of the research hinges on its potential relevance to AI safety, which in turn hinges on how good the model is for the sort of AI that will actually be built. Here I don’t mean “Is the model exactly right?” — I recognize that you’re not claiming it to be — the question is whether the model is in the right ballpark.
A case for the model being a good one requires pointing to a potentially promising AGI research program to which the model is relevant. This is the point that I feel hasn’t been addressed.
Some things that I see as analogous to the situation under discussion are:
A child psychology researcher who’s never interacted with children could write about good child rearing practices without the research being at all relevant to how to raise children well.
An economist who hasn’t looked at real world data about politics could study political dynamics using mathematical models without the researcher being at all relevant to politics in practice.
A philosopher who hasn’t study math could write the philosophy of math without the writing being relevant to math.
A therapist who’s never had experience with depression could give advice to a patient on overcoming depression without the advice being at all relevant to overcoming depression.
Similarly, somebody without knowledge of the type of AI that’s going to be built could research AI safety without the research being relevant to AI safety.
Does this help clarify where I’m coming from?
I also feel somewhat at a loss for where to proceed if I can’t say “But just look at the ideas behind Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, that’s obviously important conceptual progress because...” In other words, you see AI doing a bunch of things, we already mostly agree on what these sorts of surface real-world capabilities are, but after checking with some friends you’ve concluded that this doesn’t mean we’re less confused about AGI then we were in 1955. I don’t see how I can realistically address that except by persuading your authorities; I don’t see what kind of conversation we could have about that directly without being able to talk about specific AI things.
I’m open to learning object level material if I learn new information that convinces me that there’s a reasonable chance that MIRI’s FAI research is relevant to AI safety in practice.
Meanwhile, if you specify “I’m not convinced that MIRI’s paper has a good chance of being relevant to FAI, but only for the same reasons I’m not convinced any other AI work done in the last 60 years is relevant to FAI” then this will make it clear to everyone where you’re coming from on this issue.
Either of my best guess or Holden’s best guess could be right, and so could lots of other ideas that we haven’t thought of. My proposed conceptual framework should be viewed as one of .
Missing link suspected. Suggest verifying that the url includes ‘http://’.
Just wondering why you see Jonah Sinick of high enough status to be worth explaining to what’s been discussed on LW repeatedly. Or maybe I’m totally misreading this exchange.
Matching “first AGI will [probably] have internal structure analogous to that of a human” and “first AGI [will probably have] many interacting specialized modules” in a literal (cough uncharitable cough) manner, as evidenced by “heavier-than-air flying-machines had feathers and beaks”. Your phrasing hints at an anthropocentric architectural bias, analogous to the one you specifically distance yourself from regarding values.
Maybe you should clarify that part, it’s crucial to the current misunderstanding, and it’s not clear whether by “interacting specialized modules” you’d also refer to “Java classes not corresponding to anything ‘human’ in particular”, or whether you’d expect a “thalamus-module”.
Matching “first AGI will [probably] have internal structure analogous to that of a human” and “first AGI [will probably have] many interacting specialized modules” in a literal (cough uncharitable cough) manner, as evidenced by “heavier-than-air flying-machines had feathers and beaks”. Your phrasing hints at an anthropocentric architectural bias, analogous to the one you specifically distance yourself from regarding values.
I think that people should make more of an effort to pay attention to the nuances of people’s statements rather than using simple pattern matching.
Maybe you should clarify that part, it’s crucial to the current misunderstanding, and it’s not clear whether by “interacting specialized modules” you’d also refer to “Java classes not corresponding to anything ‘human’ in particular”, or whether you’d expect a “thalamus-module”.
There’s a great deal to write about this, and I’ll do so at a later date.
To give you a small taste of what I have in mind: suppose you ask “How likely is it that the final digit of the Dow Jones will be 2 in two weeks.” I’ve never thought about this question. A priori, I have no Bayesian prior. What my brain does, is to amalgamate
The Dow Jones index varies in a somewhat unpredictable way
The last digit is especially unpredictable.
Two weeks is a really long time for unpredictable things to happen in this context
The last digit could be one of 10 values between 0 and 9
The probability of a randomly selected digit between 0 and 9 being 2 is equal to 10%
Different parts of my brain generate the different pieces, and another part of my brain combines them. I’m not using a single well-defined Bayesian prior, nor am I satisfying a well defined utility function.
I don’t want to comment on the details, as this is way outside my area of expertise, but I do want to point out that you appear to be a victim of the bright dilettante fallacy. You appear to think that your significant mathematical background makes you an expert in an unrelated field without having to invest the time and effort required to get up to speed in it.
I don’t claim to have any object level knowledge of AI.
My views on this point are largely based on what I’ve heard from people who work on AI, together with introspection as to how I and other humans reason, and the role of heuristics in reasoning.
I’ll also highlight a comment of Nick Beckstead, which you’ve already seen and responded to. I didn’t understand your response.
Let me try from a different angle.
With humans, we see three broad clusters of modification: reproduction, education, and chemistry. Different people are physically constructed in different ways, and so we can see evolution of human civilization by biological evolution of the humans inside it. The environments that people find themselves in or choose leave imprints on those people. Chemicals people ingest can change those people, such as with caffeine, alcohol, morphine, or heroin. (I would include ‘changing your diet to change your thought processes’ under chemical changes, but the chemical changes from becoming addicted to heroin and from not being creatine deficient look very different.)
For AIs, most of the modification that’s interesting and new will look like the “chemistry” cluster. An AI modifying its source code will look a lot like a human injecting itself with a new drug that it just invented. (Nick_Beckstead’s example of modifying the code of the weather computer is more like education than it is like chemistry.)
This is great because some drugs dramatically improve performance, and so a person on caffeine could invent a super nootropic, and then on the super nootropic invent a cure for cancer and an even better nootropic, and so on. This is terrifying because any drug that adjusts your beliefs or your decision-making algorithm (think of ‘personality’ as a subset of this) dramatically changes how you behave, and might do so for the worse. This is doubly terrifying because these changes might be irreversible- you might take a drug that gets rid of your depression by making you incapable of feeling desire, and then not have any desire to restore yourself! This is triply terrifying because the effects of the drug might be unknown- you might not be able to determine what a drug will do to you until after you take it, and by then it might be too late.
For humans this problem is mostly solved by trial and error followed by patternmatching- “coffee is okay, crack is not, because Colin is rich and productive and Craig is neither”- which is not useful for new drugs, and not useful for misclassified old drugs, and not very safe for very powerful systems. The third problem- that the effects might be unknown- is the sort of thing that proofs might help with, except there are some technical obstacles to doing that. The Lobstacle is a prominent theoretical one, and while it looks like there are lots of practical obstacles as well surmounting the theoretical obstacles should help with surmounting the practical obstacles.
Any sort of AGI that’s able to alter its own decision-making process will have the ability to ‘do chemistry on itself,’ and one with stable values will need to have solved the problem of how to do that while preserving its values. (I don’t think that humans have ‘stable’ values; I’d call them something more like ‘semi-stable.’ Whether or not this is a bug or feature is unclear to me.)
I understand where you’re coming from, and I think that you correctly highlight a potential source of concern, and one which my comment didn’t adequately account for. However:
I’m skeptical that it’s possible to create an AI based on mathematical logic at all. Even if an AI with many interacting submodules is dangerous, it doesn’t follow that working on AI safety for an AI based on mathematical logic is promising.
Humans can impose selective pressures on emergent AI’s so as to mimic the process of natural selection that humans experienced.
I’m skeptical that it’s possible to create an AI based on mathematical logic at all. Even if an AI with many interacting submodules is dangerous, it doesn’t follow that working on AI safety for an AI based on mathematical logic is promising.
Eliezer’s position is that the default mode for an AGI is failure; i.e. if an AGI is not provably safe, it will almost certainly go badly wrong. In that contest, if you accept that “an AI with many interacting submodules is dangerous,” that that’s more or less equivalent to believing that one of the horribly wrong outcomes will almost certainly be achieved if an AGI with many submodules is created.
Humans can impose selective pressures on emergent AI’s so as to mimic the process of natural selection that humans experienced.
Humans are not Friendly. They don’t even have the capability under discussion here, to preserve their values under self-modification; a human-esque singleton would likely be a horrible, horrible disaster.
There are many possible operationalizations of a self-modifying AI
No doubt. And as of now, for none of them we’re able to tell whether they are safe or not. There’s insufficient rigor in the language, the formulizations aren’t standardized or pinned down (in this subject matter). MIRI’s work is creating and pinning down the milestones for how we’d even go about assessing self-modifying friendly AI in terms of goal stability, in mathematical language.
To have any operationalization of how some specific model of self-modification provably maintains some invariant would be a large step forward, the existence of other models of self-modification nonwithstanding. Safety cannot be proven for all approaches, because not all approaches are safe.
It’s conceivable that such a paper would be useful for building a self-improving AI, but a priori I would bet very heavily that activities such as (Working to increase rationality, Spreading concern for global welfare Building human capital of people who are concerned about global welfare) are more cost-effective activities ways for reducing AI risk than doing such research.
Even if that were so, that’s not MIRI’s (or EY’s) most salient comparative advantage (also: CFAR).
To have any operationalization of how some specific model of self-modification provably maintains some invariant would be a large step forward, the existence of other models of self-modification nonwithstanding. Safety cannot be proven for all approaches, because not all approaches are safe.
My claim is that there are sufficiently many possible models for AI that given what we (the Less Wrong community, not necessarily AI researchers) know now, the probability of a given model being developed is tiny.
The actionable safety issues that would come up if the AI is like the Chinese economy would be very different from the actionable safety issues that would come up if the AI is like a self-improving chess playing program, which would be very different from the actionable safety issues that would come up if the AI is of the type that Eliezer’s publication describes.
Given the paucity of information available about the design of the first AI, I don’t think that the probability of doing safety research on a particular model being actionable is sufficiently high for such research to be warranted (relative to other available activities).
Even if that were so, that’s not MIRI’s (or EY’s) most salient comparative advantage (also: CFAR).
Eliezer made a major contribution to increasing rationality with his How To Actually Change Your Mind sequence, which improved the rationality of many people who I know, including myself.
MIRI could engage in other AI safety activities, such as improving future forecasting.
If an organization doesn’t have a cost-effective activity to engage in, and the employees recognize this, then they can leave and do something else. Here I’m not claiming that this is in fact the case of MIRI, rather, I’m just responding to your argument.
MIRI’s staff could migrate to CFAR.
Out of all of the high impact activities that MIRI staff could do, it’s not clear to me that Friendly AI research is their comparative advantage.
Also, even if we accept that MIRI’s comparative advantage has to do with having a clearer view of the Friendliness vs. UnFriendliness distinction, why wouldn’t it be more effective for them to try to insure against an UnFriendly outcome by addressing the UnFriendliness already in the world today? For instance, corporate governance. Corporations’ optimization powers are a tremendous source of human happiness, but their UnFriendly tendencies are clear. For now, corporations have only parasitic intelligence, and don’t look particularly foomy, but if I had to bet on whether MIRI or Google/TenCent/Palantir/whatever was more likely to foom, there would be no contest.
[There are a bunch of assumptions embedded there. The principal ones are:
If a corporation, as currently constituted, somehow went foom it would be likely to be UnFriendly
If we were able to make it so corporations appeared more Friendly in their day-to-day actions, they would also become less likely to rush headlong into an UnFriendly foom.
I think 1 is pretty undeniable, but I could understand it if someone disagreed with 2.]
See the last paragraph of this comment highlighting my question about the relevance of the operationalization.
I feel like I’m not clear on what question you’re asking. Can you give an example of what a good answer would look like, maybe using Xs and Ys since I can hardly ask you to come up with an actual good argument?
There are many possible operationalizations of a self-modifying AI. For example,
One could model a self-improving AI as the Chinese economy (which is in some sense a self-improving powerful optimization process).
One could model a self-improving AI as a chess playing computer program which uses a positional weighting system to choose which moves to make, and which analyzes which weighting heuristics statistically lead to more winning games, in order to improve its positional weighting system.
My reaction to your paper is similar to what my reaction would be to a paper that studies ways to make sure that the Chinese economy doesn’t change in such as way that so that GDP start dropping, or ways to make sure that the chess program doesn’t self-modify to get worse and worse at winning chess games rather than better and better.
It’s conceivable that such a paper would be useful for building a self-improving AI, but a priori I would bet very heavily that activities such as
Working to increase rationality
Spreading concern for global welfare
Building human capital of people who are concerned about global welfare
are more cost-effective activities ways for reducing AI risk than doing such research.
I’m looking for an argument for why the operationalization in the paper is more likely to be relevant to creating safe AI than modeling a self-improving AI as the Chinese economy, or as the aforementioned chess program, or than a dozen other analogous operationalizations that I could make up.
If somebody wrote a paper showing how an economy could naturally build another economy while being guaranteed to have all prices derived from a constant set of prices on intrinsic goods, even as all prices were set by market mechanisms as the next economy was being built, I’d think, “Hm. Interesting. A completely different angle on self-modification with natural goal preservation.”
I’m surprised at the size of the apparent communications gap around the notion of “How to get started for the first time on a difficult basic question”—surely you can think of mathematical analogies to research areas where it would be significant progress just to throw out an attempted formalization as a base point?
There are all sorts of disclaimers plastered onto the paper about how this only works because logic is monotonic, probabilistic reasoning is not monotonic etcetera. The point is to have a way, any way of just getting started on stable self-modification even though we know the particular exact formalism doesn’t directly work for probabilistic agents. Once you do that you can at least state what it is you can’t do. A paper on a self-replicating economy with a stable set of prices on intrinsic goods would likewise be something you could look at and say, “But this formally can’t do X, because Y” and then you would know more about X and Y then you did previously. Being able to say, “But the verifier-suggester separation won’t work for expected utility agents because probabilistic reasoning is not monotonic” means you’ve gotten substantially further into FAI work than when you’re staring dumbly at the problem.
AIXI was conceptual progress on AGI, and especially public discussion of AGI, because it helped people like me say much more formally all the things that we didn’t like about AIXI, like the anvil problem or AIXI seizing control of its reward channel or AIXI only being able to represent utility functions of sensory data rather than environmental ontologies. Someone coming up with a list of 5 key properties the tiling architecture does not have would be significant progress, and I would like to specifically claim that as an intended, worthwhile, fully-pays-back-the-effort positive consequence if it happens—and this is not me covering all the bases in case of disappointment, the paper was presented in a way consonant with that goal and not in a way consonant with claiming one-trueness.
I don’t understand the model you have of FAI research where this is not the sort of thing that you do at the beginning.
Thanks for continuing to engage.
I described my position in another comment. To reiterate and elaborate:
My current best guess is that there are so many unrelated potential models for AI (relative to the information that we currently have) that the probability of FAI work on a single one of them ending up being relevant is tiny. In order to make a compelling argument for the relevance of MIRI’s work on the Lob problem, you have to argue that the model used isn’t only one of, e.g. 10^10 distinct models of AI with similar probability of being realized in practice.
One could argue that the problem is sufficiently important so that one should work on it even if the probability of the work being relevant is tiny. But there are other interventions on the table. You’ve made major contributions by spreading rationality and by creating a community for people who are interested in global welfare to network and collaborate with one another. These things probably substantially reduce astronomical waste (in expectation). In order to argue in favor of MIRI’s FAI research being optimal philanthropy, you have to argue that the probability of the research being relevant is sufficiently great so that its expected value outweighs the expected value of these other activities.
One could argue that if there are in fact so many models for AI then we’re doomed anyway, so we should assume that there aren’t so many models. But rather than trying to work on the models that we think most relevant now, we can wait until it becomes more clear what AGI will look like in practice, and then develop FAI for that type of AI. Whether or not this is feasible is of course related to the question of whether the world’s elites will navigate the creation of AI just fine. I think that there are good reasons to think that the probability of this is pretty high, and that the most leveraged efforts are getting good people in positions of future influence rather than doing FAI research now. Your work on rationality training and community building can help, and already has helped a lot with this.
Neither 2 nor 3 is the sort of argument I would ever make (there’s such a thing as an attempted steelman which by virtue of its obvious weakness doesn’t really help). You already know(?) that I vehemently reject all attempts to multiply tiny probabilities by large utilities in real-life practice, or to claim that the most probable assumption about a background epistemic question leads to a forgone doom and use this to justify its improbable negation. The part at which you lost me is of course part 1.
I still don’t understand what you could be thinking here, and feel like there’s some sort of basic failure to communicate going on. I could guess something along the lines of “Maybe Jonah is imagining that Friendly AI will be built around principles completely different from modern decision theory and any notion of a utility function...” (but really, is something like that one of just 10^10 equivalent candidates?) ”...and more dissimilar to that than logical AI is from decision theory” (that’s a lot of dissimilarity but we already demonstrated conceptual usefulness over a gap that size). Still, that’s the sort of belief someone might end up with if their knowledge of AI was limited to popular books extolling the wonderful chaos of neural networks, but that idea is visibly stupid so my mental model of Anna warns me not to attribute it to you. Or I could guess, “Maybe Jonah is Holden-influenced and thinks that all of this discussion is irrelevant because we’re going to build a Google Maps AGI”, where in point of fact it would be completely relevant, not a tiniest bit less relevant, if we were going to build a planning Oracle. (The experience with Holden does give me pause and make me worry that EA people may think they already know how to build FAI using their personal wonderful idea, just like vast numbers of others think they already know how to build FAI.) But I still can’t think of any acceptable steel version of what you mean, and I say again that it seems to me that you’re saying something that a good mainstream AI person would also be staring quizzically at.
What would be one of the other points in the 10^10-sized space? If it’s something along the lines of “an economic model” then I just explained why if you did something analogous with an economic model it could also be interesting progress, just as AIXI was conceptually important to the history of ideas in the field. I could explain your position by supposing that you think that mathematical ideas never generalize across architectures and so only analyzing the exact correct architecture of a real FAI could be helpful even at the very beginning of work, but this sounds like a visibly stupid position so the model of Anna in my head is warning me not to attribute it to you. On the other hand, some version of, “It is easier to make progress than Jonah thinks because the useful generalization of mathematical ideas does not require you to select correct point X out of 10^10 candidates” seems like it would almost have to be at work here somewhere.
I seriously don’t understand what’s going on in your head here. It sounds like any similar argument should Prove Too Much by showing that no useful work or conceptual progress could have occurred due to AI work in the last 60 years because there would be 10^10 other models for AI. Each newly written computer program is unique but the ideas behind them generalize, the resulting conceptual space can be usefully explored, that’s why we don’t start over with every new computer program. You can do useful things once you’ve collected enough treasure nuggets and your level of ability builds up, it’s not a question of guessing the one true password out of 10^10 tries with nothing being progress until then. This is true on a level of generality which applies across computer science and also to AI and also to FAI and also to decision theory and also to math. Everyone takes this for granted as an obvious background fact of doing research which is why I would expect a good mainstream AI person to also be staring quizzically at your statements here. I do not feel like the defense I’m giving here is in any way different from the defense I’d give of a randomly selected interesting AI paper if you said the same thing about it. “That’s just how research works,” I’d say.
Please amplify point 1 in much greater detail using concrete examples and as little abstraction as possible.
I continue to appreciate your cordiality.
A number of people have recently told me that they have trouble understanding me unless I elaborate further, because I don’t spell out my reasoning in sufficient detail. I think that this is more a matter of the ideas involved being complicated, and there being a lot of inferential distance, than it is lack of effort on my part, but I can see how it would be frustrating to my interlocutors. It seems that I’m subject to the illusion of transparency. I appreciate your patience.
I know that you’ve explicitly disavowed arguments of the type in my points 2 and 3. My reason for bringing them up is to highlight the importance of addressing point 1: to emphasize that it doesn’t suffice to say “the problem is important and we have to get started on it somehow.” I recognize that we have very different implicit assumptions on point 1, and that that’s where the core of the disagreement lies.
There’s essentially only one existing example of an entity with general intelligence: a human. I think that our prior should be that the first AGI will have internal structure analogous to that of a human. Here I’m not suggesting that an AGI will have human values by default: I’m totally on board with your points about the dangers of anthropomorphization in that context. Rather, what I mean is that I envisage the first AGI as having many interacting specialized modules, rather than a mathematically defined utility function.
There are serious dangers of such an entity having values that are orthogonal to humans, and serious dangers of value drift. (Your elegant article Why does power corrupt?, has some relevance to the latter point.) But it seems to me that the measures that one would want to take to prevent humans’ goals changing seem completely different from the sorts of measures that might emerge from MIRI’s FAI research.
I’ll also highlight a comment of Nick Beckstead, which you’ve already seen and responded to. I didn’t understand your response.
I should clarify that I don’t have high confidence that the first AGI will develop along these lines. But it’s my best guess, and it seems much more plausible to me than models of the type in your paper.
The difference that I perceive between the two scenarios is the nature of the feedback loops in each case.
When one is chipping away at a problem incrementally, one has the capacity to experiment and use the feedback generated from experimentation to help one limit the search space. Based on what I know about the history of science, general relativity is one of the only successful theories that was created without lots of empirical investigation.
The engineers who designed the first bridges had trillions of combinations of design features and materials to consider a priori, the vast majority of which wouldn’t work. But an empirical discovery like “material X is too weak to work within any design” greatly limits the search space, because you don’t have to think further about any of the combinations involving material X. Similarly if one makes a discovery of the type “material Y is so strong that it’ll work with any design.” By making a series of such discoveries, one can hone in on a few promising candidates.
This is how I predict that the development of AGI will go. I think that the search space is orders of magnitude too large to think about in a useful way without a lot of experimentation, and that a priori we can’t know what the first AGI will look like. I think that once it becomes more clear what the first AGI will look like, it will become much more feasible to make progress on AI safety.
It’ll take me a while to come up with a lot of concrete hypotheticals, but I’ll get back to you on this.
Okay. This sounds like you’re trying to make up your own FAI theory in much the same fashion as Holden (and it’s different from Holden’s, of course). Um, what I’d like to do at this point is take out a big Hammer of Authority and tell you to read “Artificial Intelligence: A Modern Approach” so your mind would have some better grist to feed on as to where AI is and what it’s all about. If I can’t do that… I’m not really sure where I could take this conversation. I don’t have the time to personally guide you to understanding of modern AI starting from that kind of starting point. If there’s somebody else you’d trust to tell you about AI, with more domain expertise, I could chat with them and then they could verify things to you. I just don’t know where to take it from here.
On the object level I will quickly remark that some of the first attempts at heavier-than-air flying-machines had feathers and beaks and they did not work very well, that ‘interacting specialized modules’ is Selling Nonapples, that there is an old discussion in cognitive science about the degree of domain specificity in human intelligence, and that the idea that ‘humans are the only example we have’ is generally sterile, for reasons I’ve already written about but I can’t remember the links offhand, hopefully someone else does. It might be in Levels of Organization in General Intelligence, I generally consider that pretty obsolete but it might be targeted to your current level.
Either of my best guess or Holden’s best guess could be right, and so could lots of other ideas that we haven’t thought of. My proposed conceptual framework should be viewed as one of many weak arguments.
The higher level point that I was trying to make is that [the conceptual framework implicit in view that the MIRI’s current FAI research has a non-negligible chance of being relevant to AI safety] seems highly conjunctive. I don’t mean this rhetorically at all – I genuinely don’t understand why you think that we can make progress given how great the unknown unknowns are. You may be right, but justification of your view requires further argumentation.
A more diplomatic way of framing this would be something like:
“The book Artificial Intelligence: A Modern Approach has a discussion of current approaches to artificial intelligence. Are you familiar with the ideas therein? If not, I’d suggest that you take a look”
Putting that aside, based on conversations with a number of impressive people in machine learning, etc. who I know, my impression is that at the moment, there aren’t strong contenders for research programs that could plausibly lead to AGI. I largely accept Luke’s argument in his blog post on AI timelines, but this is based on the view that the speed of research is going to increase a lot over the coming years, rather than on the belief that any existing research programs have a reasonable chance of succeeding.
I’d be very interested in hearing about existing research programs that have a reasonable chance of succeeding.
Is it your view that no progress has occurred in AI generally for the last sixty years?
The field as a whole has been making perfectly good progress AFAICT. We know a bleepton more about cognition than we did in 1955 and are much less confused by many things. Has someone been giving you an impression otherwise and if so, what field were they in?
No, it’s clear that there have been many advances, for example in chess playing programs, auto-complete search technology, automated translation, driverless cars, and speech recognition.
But my impression is that this work has only made a small dent in the problem of general artificial intelligence.
Three graduate students in machine learning at distinct elite universities.
Scott Aaronson. Even though he works in theoretical computer science rather than AI, he’s in close proximity with many colleagues who work on artificial intelligence at MIT, and so I give a fair amount of weight to his opinion.
Also, the fraction of scientists who I know who believe that there’s a promising AGI research agenda on the table is very small, mostly consisting of people around MIRI. Few of the scientists who I know have subject matter expertise, but if there was a promising AGI research agenda on the table, I would expect news of it to have percolated to at least some of the people in question.
I think I may have been one of those three graduate students, so just to clarify, my view is:
Zero progress being made seems too strong a claim, but I would say that most machine learning research is neither relevant to, nor trying to be relevant to, AGI. I think that there is no real disagreement on this empirical point (at least, from talking to both Jonah and Eliezer in person, I don’t get the impression that I disagree with either of you on this particular point).
The model for AGI that MIRI uses seems mostly reasonable, except for the “self-modification” part, which seems to be a bit too much separated out from everything else (since pretty much any form of learning is a type of self-modification—current AI algorithms are self-modifying all the time!).
On this vein, I’m skeptical of both the need or feasibility of an AI providing an actual proof of safety of self-modification. I also think that using mathematical logic somewhat clouds the issues here, and that most of the issues that MIRI is currently working on are prerequisites for any sort of AI, not just friendly AI. I expect them to be solved as a side-effect of what I see as more fundamental outstanding problems.
However, I don’t have reasons to be highly confident in these intuitions, and as a general rule of thumb, having different researchers with different intuitions pursue their respective programs is a good way to make progress, so I think it’s reasonable for MIRI to do what it’s doing (note that this is different from the claim that MIRI’s research is the most important thing and is crucial to the survival of humanity, which I don’t think anyone at MIRI believes, but I’m clarifying for the benefit of onlookers).
Agreed, the typical machine learning paper is not AGI progress—a tiny fraction of such papers being AGI progress suffices.
I want to note that the general idea being investigated is that you can have a billion successive self-modifications with no significant statistically independent chance of critical failure. Doing proofs from axioms in which case the theorems are, not perfectly strong, but at least as strong as the axioms with conditionally independent failure probabilities not significantly lowering the conclusion strength below this as they stack, is an obvious entry point into this kind of lasting guarantee. It also suggests to me that even if the actual solution doesn’t use theorems proved and adapted to the AI’s self-modification, it may have logic-like properties. The idea here may be more general than it looks at a first glance.
Can you name some papers that you think constitute AGI progress? (Not a rhetorical question.)
I’m not sure if I parse this correctly, and may be responding to something that you don’t intend to claim, but I want to remark that if the probabilities of critical failure at each stage are
0.01, 0.001, 0.0001, 0.00001, etc.
then total probability of critical failure is less than 2%. You don’t need the probability of failure at each stage to be infinitesimal, you only need the probabilities of failure to drop off fast enough.
How would they drop off if they’re “statistically independent”? In principle this could happen, given a wide separation in time, if humanity or lesser AIs somehow solve a host of problems for the self-modifier. But both the amount of help from outside and the time-frame seem implausible to me, for somewhat different reasons. (And the idea that we could know both of them well enough to have those subjective probabilities seems absurd.)
The Chinese economy was stagnant for a long time, but is now much closer to continually increasing GDP (on average) with high probability, and I expect that “goal” of increasing GDP will become progressively more stable over time.
The situation may be similar with AI, and I would expect it to be by default.
I’m aware of this argument, but I think there are other ways to get this. The first tool I would reach for would be a martingale (or more generally a supermartingale), which is a statistical process that somehow manages to correlate all of its failures with each other (basically by ensuring that any step towards failure is counterbalanced in probability by a step away from failure). This can yield bounds on failure probabiity that hold for extremely long time horizons, even if there is non-trivial stochasticity at every step.
Note that while martingales are the way that I would intuitively approach this issue, I’m trying to make the broader argument that there are ways other than mathematical logic to get what you are after (with martingales being one such example).
Please expand on this, because I’m having trouble understanding your idea as written. A martingale is defined as “a sequence of random variables (i.e., a stochastic process) for which, at a particular time in the realized sequence, the expectation of the next value in the sequence is equal to the present observed value even given knowledge of all prior observed values at a current time”, but what random variable do you have in mind here?
I can make some sense of this, but I’m not sure whether it is what Jacob has in mind because it doesn’t seem to help.
Imagine that you’re the leader of an intergalactic civilization that wants to survive and protect itself against external threats forever. (I’m spinning a fancy tale for illustration; I’ll make the link to the actual AI problem later, bear with me.) Your abilities are limited by the amount of resources in the universe you control. The variable X(t) says what fraction you control at time t; it takes values between 0 (none) or 1 (everything). If X(t) ever falls to 0, game’s over and it will stay at 0 forever.
Suppose you find a strategy such that X(t) is a supermartingale; that is, E[X(t’) | I_t] >= X_t for all t’ > t, where I_t is your information at time t. [ETA: In discrete time, this is equivalent to E[X(t+1) | I_t] >= X_t, i.e., in expectation you have at least as many resources in the next round as you have in this round.] Now clearly we have E[X(t’) | I_t] ⇐ P[X(t’) > 0 | I_t], and therefore P[X(t’) > 0 | I_t] >= X_t. Therefore, given your information at time t, the probability that your resources will never fall to zero is at least X_t (this follows from the above by using the assumption that if they ever fall to 0, then they stay at 0). So if you start with a large share of the resources, there’s a large probability that you’ll never run out.
The link to AI is that we replace “share of resources” by some “quality” parameter describing the AI. I don’t know whether Jacob has ideas what such parameter might be, but it would be such that there is a catastrophe iff it falls to 0.
The problem with all of this is that it sounds mostly like a restatement of “we don’t want there to be an independent failure probability on each step; we want there to be a positive probability that there is never a failure”. The martingale condition is a bit more specific than that, but it doesn’t tell us how to make that happen. So, unless I’m completely mistaken about what Jacob intended to say (possible), it seems more like a different description of the problem rather than a solution to the problem...
Thank you Benja, for the very nice explanation! (As a technical point, what you are describing is a “submartingale”, a supermartingale has the inequality going in the opposite direction and then of course you have to make 1 = failure and 0 = success instead of the other way around).
Martingales may in some sense “just” be a rephrasing of the problem, but I think that’s quite important! In particular, they implicitly come with a framework of thought that suggests possible approaches—for instance, one could imagine a criterion for action in which risks must always be balanced by the expectation of acquiring new information that will decrease future risks—we can then imagine writing down a potential function encapsulating both risk to humanity and information about the world / humanity’s desires, and have as a criterion of action that this potential function never increase in expectation (relative to, e.g., some subjective probability distribution that we have reason to believe is well-calibrated).
I second Wei’s question. I can imagine doing logical proofs about how your successor’s algorithms operate to try to maximize a utility function relative to a lawfully updated epistemic state, and would consider my current struggle to be how to expand this to a notion of a lawfully approximately updated epistemic state. If you say ‘martingale’ I have no idea where to enter the problem at all, or where the base statistical guarantees that form part of the martingale would come from. It can’t be statistical testing unless the problem is i.i.d. because otherwise every context shift breaks the guarantee.
I’m not sure how to parse your last sentence about statistical testing, but does Benja’s post and my response help to clarify?
You are aware that not all statistical tests require i.i.d. assumptions, right?
I’d be interested in your thoughts on the point about computational complexity in this comment.
It seems to me like relatively narrow progress on learning is likely to be relevant to AGI. It does seem plausible that e.g. machine learning research is not too much more relevant to AGI than progress in optimization or in learning theory or in type theory or perhaps a dozen other fields, but it doesn’t seem very plausible that it isn’t taking us closer to AGI in expectation.
Yes, reflective reasoning seems to be necessary to reason about the process of learning and the process of reflection, amongst other things. I don’t think any of the work that has been done applies uniquely to explicit self-modification vs. more ordinary problems with reflection (e.g. I think the notion of “truth” is useful if you want to think about thinking, and believing that your own behavior is sane is useful if you want to think about survival as an instrumental value).
This seems quite likely (or at least the weaker claim, that either these results are necessary for any AI or they are useless for any AI, seems very likely). But of course this is not enough to say that such work isn’t useful for better understanding and coping with AI impacts. If we can be so lucky as to find important ideas well in advance of building the practical tools that make those ideas algorithmically relevant, then we might develop a deeper understanding of what we are getting into and more time to explore the consequences.
In practice, even if this research program worked very well, we would probably be left with at least a few and perhaps a whole heap of interesting theoretical ideas. And we might have few clues as to which will turn out to be most important. But that would still give us some general ideas about what human-level AI might look like, and could help us see the situation more clearly.
Indeed, I would be somewhat surprised if interesting statements get proven often in the normal business of cognition. But this doesn’t mean that mathematical logic and inference won’t play an important role in AI—logical is by far the most expressive language that we are currently aware of, and therefore a natural starting point if we want to say anything formal about cognition (and as far as I can tell this is not at all a fringe view amongst folks in AI).
I’d be interested in your response to the following, which I wrote in another context. I recognize that I’m far outside of my domain of expertise, and what I write should be read as inquisitive rather than argumentative:
The impression that I’ve gotten is that to date, impressive applications of computers to do tasks that humans do are based around some combination of
Brute force computation
Task specific algorithms generated by humans
In particular, they doesn’t seem at all relevant to mimicking human inference algorithms.
As I said in my point #2 here: I find it very plausible that advances in narrow AI will facilitate the development of AGI by enabling experimentation.
The question that I’m asking is more: “Is it plausible that the first AGI will be based on filling in implementation details of current neural networks research programs, or current statistical inference research programs?”
Something worth highlighting is that researchers in algorithms have repeatedly succeeded in developing algorithms that solve NP-complete problems in polynomial time with very high probability, or that give very good approximations to solutions to problems in polynomial time where it would be NP-complete to get the solutions exactly right. But these algorithms can’t be ported from one NP-complete problem to another while retaining polynomial running time. One has to deal with each algorithmic problem separately.
From what I know, my sense is that one has a similar situation in narrow AI, and that humans (in some vague sense) have a polynomial time algorithm that’s robust across different algorithmic tasks.
I don’t really understand how “task specific algorithms generated by humans” differs from general intelligence. Humans choose a problem, and then design algorithms to solve the problem better. I wouldn’t expect a fundamental change in this situation (though it is possible).
I think this is off. A single algorithm currently achieves the best known approximation ratio on all constraint satisfaction problems with local constraints (this includes most of the classical NP-hard approximation problems where the task is “violate as few constraints as possible” rather than “satisfy all constraints, with as high a score as possible”), and is being expanded to cover increasingly broad classes of global constraints. You could say “constraint satisfaction is just another narrow task” but this kind of classification is going to take you all the way up to human intelligence and beyond. Especially if you think ‘statistical inference’ is also a narrow problem, and that good algorithms for planning and inference are more of the same.
All I’m saying here is that general intelligence can construct algorithms across domains, whereas my impression is that impressive human+ artificial intelligence to date hasn’t been able to construct algorithms across domains.
General artificial intelligence should be able to prove:
The Weil conjectures
The geometrization conjecture,
Monstrous Moonshine
The classification of simple finite groups
The Atiyah Singer Index Theorem
The Virtual Haken Conjecture
and thousands of other such statements. My impression is that current research in AI is analogous to working on proving these things one at a time.
Working on the classification of simple finite groups could indirectly help you prove the Atiyah-Singer Index Theorem on account of leading to the discovery of structures that are relevant, but such work will only make a small dent on the problem of proving the Atiyah-Singer Index Theorem. Creating an algorithm that can prove these things (that’s not over-fitted to the data) is a very different problem from that of proving the theorems individually.
Do you think that the situation with AI is analogous or disanalogous?
I’m not sure if I follow. Is the algorithm that you have in mind the conglomeration of all existing algorithms?
If so, it’s entirely unclear how quickly the algorithm is growing relative to the problems that we’re interested in.
No, there is a single SDP rounding scheme that gets optimal performance on all constraint satisfaction problems (the best we know so far, and the best possible under the unique games conjecture).
Can you give a reference?
http://dl.acm.org/citation.cfm?id=1374414
PDF.
I’d be interested in your thoughts on this discussion post.
I would disagree with the statement that our algorithms are all domain-specific. Often some amount of domain-specific knowledge is needed to design a good algorithm, but it is often quite minimal. For instance, my office-mate is building a parser for interpreting natural language semantics, and has taken zero linguistics classes (but has picked up some amount of linguistics knowledge from talks, etc.). Of course, he’s following in the footsteps of people who do know linguistics, but the point is just that the methods people use tend to be fairly general despite requiring task-specific tuning.
I agree, of course, that there are few systems that work across multiple domains, but I’m not sure that that’s a fundamental issue so much as a symptom of broader issues that surface in this context (such as latent variables and complex features).
Thanks Jacob. I’d be interested in your thoughts on this discussion post.
You can’t do that? From random things like computer security papers, I was under the impression that you could do just that—convert any NP problem to a SAT instance and toss it at a high-performance commodity SAT solver with all its heuristics and tricks, and get an answer back.
You can do this. Minor caveat: this works for overall heuristic methods- like “tabu search” or “GRASP”- but many of the actual implementations you would see in the business world are tuned to the structure of the probable solution space. One of the traveling salesman problem solvers I wrote a while back would automatically discover groups of cities and move them around as a single unit- useful when there are noticeable clusters in the space of cities, not useful when there aren’t. Those can lead to dramatic speedups (or final solutions that are dramatically closer to the optimal solution) but I don’t think they translate well across reformulations of the problem.
I’m not a subject matter expert here, and just going based on my memory and what some friends have said, but according to http://en.wikipedia.org/wiki/Approximation_algorithm,
You can do that. But although such algorithms will produce correct answers to any NP problem when given correct answers to SAT, that does not mean that they will produce approximate answers to any NP problem when given approximate answers to SAT. (In fact, I’m not sure if the concept of an approximate answer makes sense for SAT, although of course you could pick a different NP-complete problem to reduce to.)
Edit: My argument only applies to algorithms that give approximate solutions, not to algorithms that give correct solutions with high probability, and reading your comment again, it looks like you may have been referring to the later. You are correct that if you have a polynomial-time algorithm to solve any NP-complete problem with high probability, then you can get a polynomial-time algorithm to solve any NP problem with high probability. Edit 2: sort of; see discussion below.
Oh, I see. I confused probabilistic algorithms with ones bounding error from the true optimal solution.
Can you give a reference?
If a problem is NP-complete, then by definition, any NP problem can be solved in polynomial time by an algorithm which is given an oracle that solves the NP-complete problem, which it is allowed to use once. If, in place of the oracle, you substitute a polynomial-time algorithm which solves the problem correctly 90% of the time, the algorithm will still be polynomial-time, and will necessarily run correctly at least 90% of the time.
However, as JoshuaZ points out, this requires that the algorithm solve every instance of the problem with high probability, which is a much stronger condition than just solving a high proportion of instances. In retrospect, my comment was unhelpful, since it is not known whether there are any algorithms than solve every instance of an NP-complete problem with high probability. I don’t know how generalizable the known tricks for solving SAT are (although presumably they are much more generalizable than JoshuaZ’s example).
This is the key. If you had an algorithm that solved every instance of an NP-complete problem in polynomial time with high probability, you could generate a proof of the Riemann hypothesis with high probability! (Provided that the polynomial time algorithm is pretty fast, and that the proof isn’t too long)
It depends on think on what AlexMennen meant by this. If for example there is a single NP complete problem in BPP then it is clear that NP is in BPP. Similar remarks apply to ZPP, and in both cases, almost the entire polynomial hierarchy will collapse. The proofs here are straightforward.
If, however, Alex meant that one is picking random instance of a specific NP complete problem, and that they can be solved deterministically, then Alex’s claim seems wrong. Consider for example this problem: “If an input string of length n starts with exactly floor(n^(1/2)) zeros and then a 1, treat the remainder like it is an input string for 3-SAT. If the string starts with anything else, return instead the parity of the string.” This is an NP-complete problem where we can solve almost all instances with high probability since most instances are really just a silly P problem. But we cannot use this fact to solve another NP complete problem (say normal 3-SAT) with high probability.
Why?
Well, in the easy case of ZPP, ZPP is contained in co-NP, so if NP is contained in ZPP then NP is contained in co-NP, in which case the hierarchy must collapse to the first level.
In the case of BPP, the details are slightly more subtle and requires deeper results. If BPP contains NP, then Adelman’s theorem says that then the entire polynomial hierarchy is contained in BPP. Since BPP is itself contained at finite level of the of the hierarchy, this forces collapse to at least that level.
Point of order: Let A = “these results are necessary for any AI” and B = “they are useless for any AI”. It sounds like you’re weakening from A to (A or B) because you feel the probability of B is large, and therefore the probability of A isn’t all that large in absolute terms. But if much of the probability mass of the weaker claim (A or B) comes from B, then if at all possible, it seems more pragmatically useful to talk about (i) the probability of B and (ii) the probability of A given (not B), instead of talking about the probability of (A or B), since qualitative statements about (i) and (ii) seem to be what’s most relevant for policy. (In particular, even knowing that “the probability of (A or B) is very high” and “the probability of A is not that high”—or even “is low”—doesn’t tell us whether P(A|not B) is high or low.)
My impression from your above comments is that we are mostly in agreement except for how much we respectively like mathematical logic. This probably shouldn’t be surprising given that you are a complexity theorest and I’m a statistician, and perhaps I should learn some more mathematical logic so I can appreciate it better (which I’m currently working on doing).
I of course don’t object to logic in the context of AI, it mainly seems to me that the emphasis on mathematical logic in this particular context is unhelpful, as I don’t see the issues being raised as being fundamental to what is going on with self-modification. I basically expect whatever computationally bounded version of probability we eventually come up with to behave locally rather than globally, which I believe circumvents most of the self-reference issues that pop up (sorry if that is somewhat vague intuition).
Thanks Jacob.
I’d be interested in your thoughts on my comment here.
Hm. I’m not sure if Scott Aaronson has any weird views on AI in particular, but if he’s basically mainstream-oriented we could potentially ask him to briefly skim the Tiling Agents paper and say if it’s roughly the sort of paper that it’s reasonable for an organization like MIRI to be working on if they want to get some work started on FAI. At the very least if he disagreed I’d expect he’d do so in a way I’d have better luck engaging conversationally, or if not then I’d have two votes for ‘please explore this issue’ rather than one.
I feel again like you’re trying to interpret the paper according to a different purpose from what it has. Like, I suspect that if you described what you thought a promising AGI research agenda was supposed to deliver on what sort of timescale, I’d say, “This paper isn’t supposed to do that.”
This part is clearer and I think I may have a better idea of where you’re coming from, i.e., you really do think the entire field of AI hasn’t come any closer to AGI, in which case it’s much less surprising that you don’t think the Tiling Agents paper is the very first paper ever to come closer to AGI. But this sounds like a conversation that someone else could have with you, because it’s not MIRI-specific or FAI-specific. I also feel somewhat at a loss for where to proceed if I can’t say “But just look at the ideas behind Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, that’s obviously important conceptual progress because...” In other words, you see AI doing a bunch of things, we already mostly agree on what these sorts of surface real-world capabilities are, but after checking with some friends you’ve concluded that this doesn’t mean we’re less confused about AGI then we were in 1955. I don’t see how I can realistically address that except by persuading your authorities; I don’t see what kind of conversation we could have about that directly without being able to talk about specific AI things.
Meanwhile, if you specify “I’m not convinced that MIRI’s paper has a good chance of being relevant to FAI, but only for the same reasons I’m not convinced any other AI work done in the last 60 years is relevant to FAI” then this will make it clear to everyone where you’re coming from on this issue.
He wrote this about a year ago:
And later:
Without further context I see nothing wrong here. Superintelligences are Turing machines, check. You might need a 10^20 slowdown before that becomes relevant, check. It’s possible that the argument proves too much by showing that a well-trained high-speed immortal dog can simulate Mathematica and therefore a dog is ‘intellectually expressive’ enough to understand integral calculus, but I don’t know if that’s what Scott means and principle of charity says I shouldn’t assume that without confirmation.
EDIT: Parent was edited, my reply was to the first part, not the second. The second part sounds like something to talk with Scott about. I really think the “You’re just as likely to get results in the opposite direction” argument is on the priors overstated for most forms of research. Does Scott think that work we do today is just as likely to decrease our understanding of P/NP as increase it? We may be a long way off from proving an answer but that’s not a reason to adopt such a strange prior.
As it happens, I’ve been chatting with Scott about this issue recently, due to some comments he made in his recent quantum Turing machine paper:
I thought his second objection (“how could we know what to do about it?”) was independent of his first objection (“AI seems farther away than the singularitarians tend to think”), but when I asked him about it, he said his second objection just followed from the first. So given his view that AI is probably centuries away, it seems really hard to know what could possibly help w.r.t. FAI. And if I thought AI was several centuries away, I’d probably have mostly the same view.
I asked Scott: “Do you think you’d hold roughly the same view if you had roughly the probability distribution over year of AI creation as I gave in When Will AI Be Created? Or is this part of your view contingent on AI almost certainly being several centuries away?”
He replied: “No, if my distribution assigned any significant weight to AI in (say) a few decades, then my views about the most pressing tasks today would almost certainly be different.” But I haven’t followed up to get more specifics about how his views would change.
And yes, Scott said he was fine with quoting this conversation in public.
I think I’d be happy with a summary of persistent disagreement where Jonah or Scott said, “I don’t think MIRI’s efforts are valuable because we think that AI in general has made no progress on AGI for the last 60 years / I don’t think MIRI’s efforts are priorities because we don’t think we’ll get AGI for another 2-3 centuries, but aside from that MIRI isn’t doing anything wrong in particular, and it would be an admittedly different story if I thought that AI in general was making progress on AGI / AGI was due in the next 50 years”.
I think that your paraphrasing
is pretty close to my position.
I would qualify it by saying:
I’d replace “no progress” with “not enough progress for there to be a known research program with a reasonable chance of success.”
I have high confidence that some of the recent advances in narrow AI will contribute (whether directly or indirectly) to the eventual creation of AGI (contingent on this event occurring), just not necessarily in a foreseeable way.
If I discover that there’s been significantly more progress on AGI than I had thought, then I’ll have to reevaluate my position entirely. I could imagine updating in the directly of MIRI’s FAI work being very high value, or I could imagine continuing to believe that MIRI’s FAI research isn’t a priority, for reasons different from my current ones.
Agreed-on summaries of persistent disagreement aren’t ideal, but they’re more conversational progress than usually happens, so… thanks!
I’m doing some work for MIRI looking at the historical track record of predictions of the future and actions taken based on them, and whether such attempts have systematically done as much harm as good.
To this end, among other things, I’ve been reading Nate Silver’s The Signal and the Noise. In Chapter 5, he discusses how attempts to improve earthquake predictions have consistently yielded worse predictive models than the Gutenberg-Richter law. This has slight relevance.
Such examples not withstanding, my current prior is on MIRI’s FAI research having positive expected value. I don’t think that the expected value of the research is zero or negative – only that it’s not competitive with the best of the other interventions on the table.
My own interpretation of Scott’s words here is that it’s unclear whether your research is actually helping in the “get Friendly AI before some idiot creates a powerful Unfriendly one” challenge. Fundamental progress in AI in general could just as easily benefit the fool trying to build a AGI without too much concern for Friendliness, as it could benefit you. Thus, whether fundamental research helps out avoiding the UFAI catastrophy is unclear.
I’m not sure that interpretation works, given that he also wrote:
Since Scott was addressing steps taken to act on the conclusion that friendliness was supremely important, presumably he did not have in mind general AGI research.
Yes, I would welcome his perspective on this.
I think I’ve understood your past comments on this point. My questions are about the implicit assumptions upon which the value of the research rests, rather than about what the research does or doesn’t succeed in arguing.
As I said in earlier comments, the case for the value of the research hinges on its potential relevance to AI safety, which in turn hinges on how good the model is for the sort of AI that will actually be built. Here I don’t mean “Is the model exactly right?” — I recognize that you’re not claiming it to be — the question is whether the model is in the right ballpark.
A case for the model being a good one requires pointing to a potentially promising AGI research program to which the model is relevant. This is the point that I feel hasn’t been addressed.
Some things that I see as analogous to the situation under discussion are:
A child psychology researcher who’s never interacted with children could write about good child rearing practices without the research being at all relevant to how to raise children well.
An economist who hasn’t looked at real world data about politics could study political dynamics using mathematical models without the researcher being at all relevant to politics in practice.
A philosopher who hasn’t study math could write the philosophy of math without the writing being relevant to math.
A therapist who’s never had experience with depression could give advice to a patient on overcoming depression without the advice being at all relevant to overcoming depression.
Similarly, somebody without knowledge of the type of AI that’s going to be built could research AI safety without the research being relevant to AI safety.
Does this help clarify where I’m coming from?
I’m open to learning object level material if I learn new information that convinces me that there’s a reasonable chance that MIRI’s FAI research is relevant to AI safety in practice.
Yes, this is where I’m coming from.
Missing link suspected. Suggest verifying that the url includes ‘http://’.
Just wondering why you see Jonah Sinick of high enough status to be worth explaining to what’s been discussed on LW repeatedly. Or maybe I’m totally misreading this exchange.
Maybe something to do with Jonah being previously affiliated with GiveWell?
I’m puzzled as to what you think I’m missing: can you say more?
Matching “first AGI will [probably] have internal structure analogous to that of a human” and “first AGI [will probably have] many interacting specialized modules” in a literal (cough uncharitable cough) manner, as evidenced by “heavier-than-air flying-machines had feathers and beaks”. Your phrasing hints at an anthropocentric architectural bias, analogous to the one you specifically distance yourself from regarding values.
Maybe you should clarify that part, it’s crucial to the current misunderstanding, and it’s not clear whether by “interacting specialized modules” you’d also refer to “Java classes not corresponding to anything ‘human’ in particular”, or whether you’d expect a “thalamus-module”.
I think that people should make more of an effort to pay attention to the nuances of people’s statements rather than using simple pattern matching.
There’s a great deal to write about this, and I’ll do so at a later date.
To give you a small taste of what I have in mind: suppose you ask “How likely is it that the final digit of the Dow Jones will be 2 in two weeks.” I’ve never thought about this question. A priori, I have no Bayesian prior. What my brain does, is to amalgamate
The Dow Jones index varies in a somewhat unpredictable way
The last digit is especially unpredictable.
Two weeks is a really long time for unpredictable things to happen in this context
The last digit could be one of 10 values between 0 and 9
The probability of a randomly selected digit between 0 and 9 being 2 is equal to 10%
Different parts of my brain generate the different pieces, and another part of my brain combines them. I’m not using a single well-defined Bayesian prior, nor am I satisfying a well defined utility function.
I don’t want to comment on the details, as this is way outside my area of expertise, but I do want to point out that you appear to be a victim of the bright dilettante fallacy. You appear to think that your significant mathematical background makes you an expert in an unrelated field without having to invest the time and effort required to get up to speed in it.
I don’t claim to have any object level knowledge of AI.
My views on this point are largely based on what I’ve heard from people who work on AI, together with introspection as to how I and other humans reason, and the role of heuristics in reasoning.
Let me try from a different angle.
With humans, we see three broad clusters of modification: reproduction, education, and chemistry. Different people are physically constructed in different ways, and so we can see evolution of human civilization by biological evolution of the humans inside it. The environments that people find themselves in or choose leave imprints on those people. Chemicals people ingest can change those people, such as with caffeine, alcohol, morphine, or heroin. (I would include ‘changing your diet to change your thought processes’ under chemical changes, but the chemical changes from becoming addicted to heroin and from not being creatine deficient look very different.)
For AIs, most of the modification that’s interesting and new will look like the “chemistry” cluster. An AI modifying its source code will look a lot like a human injecting itself with a new drug that it just invented. (Nick_Beckstead’s example of modifying the code of the weather computer is more like education than it is like chemistry.)
This is great because some drugs dramatically improve performance, and so a person on caffeine could invent a super nootropic, and then on the super nootropic invent a cure for cancer and an even better nootropic, and so on. This is terrifying because any drug that adjusts your beliefs or your decision-making algorithm (think of ‘personality’ as a subset of this) dramatically changes how you behave, and might do so for the worse. This is doubly terrifying because these changes might be irreversible- you might take a drug that gets rid of your depression by making you incapable of feeling desire, and then not have any desire to restore yourself! This is triply terrifying because the effects of the drug might be unknown- you might not be able to determine what a drug will do to you until after you take it, and by then it might be too late.
For humans this problem is mostly solved by trial and error followed by patternmatching- “coffee is okay, crack is not, because Colin is rich and productive and Craig is neither”- which is not useful for new drugs, and not useful for misclassified old drugs, and not very safe for very powerful systems. The third problem- that the effects might be unknown- is the sort of thing that proofs might help with, except there are some technical obstacles to doing that. The Lobstacle is a prominent theoretical one, and while it looks like there are lots of practical obstacles as well surmounting the theoretical obstacles should help with surmounting the practical obstacles.
Any sort of AGI that’s able to alter its own decision-making process will have the ability to ‘do chemistry on itself,’ and one with stable values will need to have solved the problem of how to do that while preserving its values. (I don’t think that humans have ‘stable’ values; I’d call them something more like ‘semi-stable.’ Whether or not this is a bug or feature is unclear to me.)
I understand where you’re coming from, and I think that you correctly highlight a potential source of concern, and one which my comment didn’t adequately account for. However:
I’m skeptical that it’s possible to create an AI based on mathematical logic at all. Even if an AI with many interacting submodules is dangerous, it doesn’t follow that working on AI safety for an AI based on mathematical logic is promising.
Humans can impose selective pressures on emergent AI’s so as to mimic the process of natural selection that humans experienced.
Eliezer’s position is that the default mode for an AGI is failure; i.e. if an AGI is not provably safe, it will almost certainly go badly wrong. In that contest, if you accept that “an AI with many interacting submodules is dangerous,” that that’s more or less equivalent to believing that one of the horribly wrong outcomes will almost certainly be achieved if an AGI with many submodules is created.
Humans are not Friendly. They don’t even have the capability under discussion here, to preserve their values under self-modification; a human-esque singleton would likely be a horrible, horrible disaster.
No doubt. And as of now, for none of them we’re able to tell whether they are safe or not. There’s insufficient rigor in the language, the formulizations aren’t standardized or pinned down (in this subject matter). MIRI’s work is creating and pinning down the milestones for how we’d even go about assessing self-modifying friendly AI in terms of goal stability, in mathematical language.
To have any operationalization of how some specific model of self-modification provably maintains some invariant would be a large step forward, the existence of other models of self-modification nonwithstanding. Safety cannot be proven for all approaches, because not all approaches are safe.
Even if that were so, that’s not MIRI’s (or EY’s) most salient comparative advantage (also: CFAR).
My claim is that there are sufficiently many possible models for AI that given what we (the Less Wrong community, not necessarily AI researchers) know now, the probability of a given model being developed is tiny.
The actionable safety issues that would come up if the AI is like the Chinese economy would be very different from the actionable safety issues that would come up if the AI is like a self-improving chess playing program, which would be very different from the actionable safety issues that would come up if the AI is of the type that Eliezer’s publication describes.
Given the paucity of information available about the design of the first AI, I don’t think that the probability of doing safety research on a particular model being actionable is sufficiently high for such research to be warranted (relative to other available activities).
Eliezer made a major contribution to increasing rationality with his How To Actually Change Your Mind sequence, which improved the rationality of many people who I know, including myself.
MIRI could engage in other AI safety activities, such as improving future forecasting.
If an organization doesn’t have a cost-effective activity to engage in, and the employees recognize this, then they can leave and do something else. Here I’m not claiming that this is in fact the case of MIRI, rather, I’m just responding to your argument.
MIRI’s staff could migrate to CFAR.
Out of all of the high impact activities that MIRI staff could do, it’s not clear to me that Friendly AI research is their comparative advantage.
Also, even if we accept that MIRI’s comparative advantage has to do with having a clearer view of the Friendliness vs. UnFriendliness distinction, why wouldn’t it be more effective for them to try to insure against an UnFriendly outcome by addressing the UnFriendliness already in the world today? For instance, corporate governance. Corporations’ optimization powers are a tremendous source of human happiness, but their UnFriendly tendencies are clear. For now, corporations have only parasitic intelligence, and don’t look particularly foomy, but if I had to bet on whether MIRI or Google/TenCent/Palantir/whatever was more likely to foom, there would be no contest.
[There are a bunch of assumptions embedded there. The principal ones are:
If a corporation, as currently constituted, somehow went foom it would be likely to be UnFriendly
If we were able to make it so corporations appeared more Friendly in their day-to-day actions, they would also become less likely to rush headlong into an UnFriendly foom.
I think 1 is pretty undeniable, but I could understand it if someone disagreed with 2.]