It seems that he and other people who understand all the arguments in favor of friendly AI and yet decide to ignore it, or disregard it as unfeasible, are rationalizing.
They may have genuinely different estimates for various probabilities. Don’t be so quick to assume that people who disagree are rationalizing. That’s an easy way to get into a death spiral.
Yet I think that compared to the alternatives the arguments in favor of friendly AI are water-tight.
As I’ve pointed out here before, a lot of the versions of fooming that are discussed here seem to rest on assuming massive software optimization, not just hardware optimization. This runs into strongly believed theoretical comp sci limits such as the likelyhood that P != NP. These issues also come up in hardware design. It may be my own cognitive biases in trying to make something near my field feel useful, but it does seem like this sort of issue is not getting sufficient attention when discussing the probability of AI going foom.
This runs into strongly believed theoretical comp sci limits such as the likelyhood that P != NP.
Does it? There are certainly situations (breaking encryption) where the problem statement looks something like “I’d like my program to be able to get the single exact solution to this problem in polynomial time”, but for optimization we’re often perfectly happy with “I’d like my program to be able to get close to the exact solution in polynomial time”, or even just “I’d like my program to be able to get a much better solution than people’s previous intuitive guesses”.
Yes, this is one of the issues that does need to be thought about. But there are limits to this. However, there’s been work in the last few years (especially for problems in graph theory) where one can show that being able to find close to optimal solutions is equivalent to being able to find optimal solutions.
The encryption example is also an interesting one, in that many foom scenarios involve the hypothetical AI gaining control over lots of the world’s computer systems. And in fact, no known encryption system rest on a problems which is NP-complete, so for all we know we could have P !=NP in a strong sense and yet still have essentially all encryption be vulnerable. Also, the AI might find security vulnerabilities completely independent of the math due to implementation issues.
Another reason that P != NP might not be a severe barrier that cousin_it pointed out is that our AI would be unlikely to need to solve general NP-complete problems, just real-world instances which are likely to have additional regularity in them. Moreover, for most NP-complete problems, random instances of the problems are generally easy (indeed, this is part of what went wrong in Deolalikar’s attempted proof. The general behavior on aggregate of NP-complete problems like K-sat for fixed k>=3 looks a lot like 2-SAT which is in P). So something trying to foom might not even run into such instances when trying to engage in practical applications of NP hard problems (such as where say graph coloring comes in optimizing memory allocation.) And there are some NP hard problems where one really does care about some restricted form of the problem. One example of this is protein folding where it seems that most models of protein folding are at least NP hard.
However, when trying to foom, an AI will probably want to directly improve the algorithms it is using on a regular basis. While not directly a P/NP issue, it is noteworthy that for some common procedures, such as finding gcds, and solving linear programming, we have what are in many ways close to optimal algorithms (the situation is more complicated for linear programing than that but the general point is correct). So the AI probably cannot focus primarily on modifying software because a lot of the basic things that the AI would want to optimize are already close to best possible even for both the average cases and the worst cases.
There are also other problems with this sort of thing. So for example, it might turn out that P != NP but that there’s a general NP-complete solver that runs in almost polynomial time with a really small constant in the Big-Oh. There’s as I understand it some very weak results showing that this can’t happen for some functions very close to polynomial growth.
There are other things that could go wrong. If for example, quantum computers are much more powerful than classical systems then the AI could if it had access to one just side-step almost all of these issues. We can’t for example even show that BQP is contained in NP. If BQP is larger than anticipated then this is a serious problem. Moreover, even if P !=NP in a strong sense, it is plausible that there’s a quantum algorithm which can solve NP-complete problems in practically close to polynomial time (this is more plausible than the similar idea in the previous paragraph about the classical case). So, if we end up with practical quantum computing, running an AI on it would be a bad idea.
There are also a handful of more exotic possibilities. Scott Aaaronson has discussed using minature wormholes to aid computation. And if you can do this, then all of PSPACE collapses. From the perspective of an AI trying to foom, this is good. From our perspective, probably not. The good news here is that the tech level required is probably well beyond anything we have and so no AI would probably have access to this sort of thing unless it had already foomed or was so close to fooming that it makes no difference.
The overall upshot of this is that while it is possible that an AI with access to a quantum computer could plausibly foom with most of the focus on software improvement, comp sci issues suggest that an AI trying to foom on a classical system would need to engage in both hardware and software improvement. Trying to improve your hardware is tough without either prior access to molecular nanotech(which is its own can of nastiness regardless of whether humans or AI have access) or a lot of cooperative humans who have given one access to circuit board and chip manufacturing facilities. Moreover, hardware generally produces diminishing marginal returns as long as the hardware remains the same type (i.e. just improving on classical computers) and could run into its own problems as design issues themselves involve problems which are computationally complex (although cousin_it again has pointed out that this could be less of a problem than one might think since there are regularities that might show up).
I will be able to sleep better at night once we’ve proven that P != NP.
(Ok. There’s a lot of material here, and some of it probably should get expanded. How would people feel about expanding this into a top-level post with nice citations, better organization, and all that?)
I see a bit of a disconnect here from historical algorithmic improvements. In the last five decades humans have created algorithms for solving many problems that had previously been intractable, and given orders of magnitude improvement on others. Many of these have come from math/compsci innovation that was not particularly hardware-limited, i.e. if you had the same (or a larger/smarter-on-average/better-organized) research community but with frozen primitive hardware many of the insights would have been found.
At the moment there are some problems for which we have near-optimal algorithms, where we can show that near-optimal algorithms are out of reach but further performance improvements are unlikely. There are also problems where we are clearly far from the reachable frontier (whether that is near-optimal performance, or just the best that can be done given resource constraints).
The huge swathe of skills wielded by humans but not by existing AI systems shows that in terms of behavioral capabilities there is a lot of room for growth in capacity that does not depend on outperforming the algorithms where we have near-optimal methods (or optimal under resource constraint). The fact that we are the first species on Earth to reach civilization-supporting levels of cognitive capacities, suggests that there is room to grow beyond that in terms of useful behavioral capacities (which may be produced using various practical strategies that involve different computational problems) before hitting the frontier of feasibility. So long as enough domains have room to grow, they can translate into strategic advantage even if others are stable.
Also, I would note that linear performance gains on one measure can lead to much greater gains on another, e.g. linear improvements in predicting movements in financial markets translate to exponential wealth gains, gains in social manipulation or strategic acumen give disproportionate returns when they enable one to reliably outmaneuver one’s opposition, linear gains in chess performance translate into an exponential drop-off in the number of potential human challengers, etc.
In the last five decades humans have created algorithms for solving many problems that had previously been intractable, and given orders of magnitude improvement on others. Many of these have come from math/compsci innovation that was not particularly hardware-limited, i.e. if you had the same (or a larger/smarter-on-average/better-organized) research community but with frozen primitive hardware many of the insights would have been found.
Yes. I agree strongly with this. One major thing we’ve found in the last few years is just that P turns out to be large and a lot of problems have turned out to be in there that were not obviously so. If one asked people in the early 1970s whether they would expect primality testing to be in P they would probably say no. Moreover, some practical problems have simply had their implementations improved a lot even as the space and time of the algorithms remain in the same complexity classes.
There are also problems where we are clearly far from the reachable frontier (whether that is near-optimal performance, or just the best that can be done given resource constraints).
Can you expand on this? I’m not sure I follow.
So long as enough domains have room to grow, they can translate into strategic advantage even if others are stable
Sure. But that doesn’t say much about how fast that growth will occur. The standard hard-take off narratives have the AI becoming functionally in control of its light cone in a matter of hours or at most weeks. I agree that there is likely a lot of room for improvement in cognitive capability but the issue in this context is whether it is likely for that sort of improvement to occur quickly.
linear gains in chess performance translate into an exponential drop-off in the number of potential human challengers, etc.
I agree with your other examples. And it is a valid point. I don’t think that a strong form P !=NP makes fooming impossible, just that it makes it much less likely. The chess example however has an issue that needs to be nitpicked. As I understand it, this isn’t really about linear gains in chess translating into exponential drop off but rather an artifact of the Elo system which sort of requires that linear increase corresponds to quick improvement.
The standard hard-take off narratives have the AI becoming functionally in control of its light cone in a matter of hours or at most weeks.
The human field of AI is about half a million hours old, computer elements can operate at a million times human speed (given enough parallel elements). To the extent that many of the important discoveries were not limited by chip speeds but by the pace of CS, math, and AI researchers’ thinking (with most of the work done by some thousands of people who spent much of that time eating, sleeping, goofing off, getting up to speed on existing knowledge in the field).
With a big fast hardware base (relative to the program) and AI sophisticated enough to keep learning without continual human guidance and grok AI theory, gains comparable to the history of AI so far in a few hours or weeks would be reasonable from speedup alone.
I agree that one could have scenarios in which there are AI programs with humanlike capacities that are not yet capable of such development (e.g. a super-bloated system running on massive server farms). However, they tend to involve AI development happening very surprisingly quickly, and don’t seem stable for long (bloated implementations can be made more efficient, with strong positive feedback in the improvement, and superhuman hardware will come soon after powerful AI if not before).
an artifact of the Elo system which sort of requires that linear increase corresponds to quick improvement
I agree that this is true, but people often cite chess as an example where exponential hardware increases in the same algorithms led to only linear (Elo) gains.
With a big fast hardware base (relative to the program) and AI sophisticated enough to keep learning without continual human guidance and grok AI theory, gains comparable to the history of AI so far in a few hours or weeks would be reasonable from speedup alone.
Sure. But the end result of all that might end up be very small improvements in actual algorithmic efficiency. It might turn out for example that the best factoring algorithms are of the same order as the current sieves, and it might turn out that after thousands of additional hours of comp sci work the end result is a very difficult proof of that. If the complexity hierarchy doesn’t collapse in a strong sense, then even with lots of resources to spend just thinking about algorithms, the AI won’t improve the algorithms by that much in terms of actual speed, because they can’t be.
But the end result of all that might end up be very small improvements in actual algorithmic efficiency. It might turn out for example that the best factoring algorithms are of the same order as the current sieves, and it might turn out that after thousands of additional hours of comp sci work the end result is a very difficult proof of that.
Yes, I agreed that we should expect this on some problems, but that we don’t have reason to expect it across most problems, weighted by practical impact. Especially so for the specific skills where humans greatly outperform computers, skills with great relevance for strategic advantage.
Do you think we have much reason to expect that the algorithms underlying human performance (in the problems where humans greatly outperform today’s AI) are mostly near optimal at what they do, such that AIs won’t have any areas of huge advantage to leverage?
Yes, I agreed that we should expect this on some problems, but that we don’t have reason to expect it across most problems, weighted by practical impact, especially for the specific skills where humans greatly outperform computers, skills with great relevance for strategic advantage.
I agree with the human skills. I disagree with the claim for problems by practical impact. For example, many practical problems turn out in the general cases to be NP hard or NP complete, or are believed to be not solvable in polynomial time. Examples include the traveling salesman and graph coloring both of which come up very frequently in practical applications across a wide range of contexts.
Do you think we have much reason to expect that the algorithms underlying human performance (in the problems where humans greatly outperform today’s AI) are mostly near optimal at what they do, such that AIs won’t have any areas of huge advantage to leverage?
Many of those algorithms might be able to be optimized a lot. There’s an argument that we should expect humans to be near optimal (since we’ve spent a million years evolving to be really good at face recognition, understanding other human minds etc.) and our neural nets are trained from a very young age to do this. But there’s a lot of evidence that we are in fact suboptimal. Evidence for this includes Dunbar’s number and a lot of classical cognitive biases such as the illusion of transparency.
But a lot of those aren’t that relevant to fooming. Most humans can do facial recognition pretty fast and pretty reliably. If an AI can do that with a much tinier set of resources, more quickly and more reliably, that’s really neat but that isn’t going to help it go foom.
I agree that one could have scenarios in which there are AI programs with humanlike capacities that are not yet capable of such development (e.g. a super-bloated system running on massive server farms). However, they tend to involve AI development happening very surprisingly quickly, and don’t seem stable for long (bloated implementations can be made more efficient, with strong positive feedback in the improvement, and superhuman hardware will come soon after powerful AI if not before).
I’m not sure how to interpret what you’re saying. You say:
they tend to involve AI development happening very surprisingly quickly
which sounds to me like a summary of long experience. But you also seem to be talking about a scenario which you cannot possibly have experienced even once. So, I’m not sure what you’re saying.
I’m saying that in my experience of people working out consistent scenarios that involve AI development with sustained scarcity, the scenarios offered usually involve the development of human-level AI early, before hardware can advance much further.
I agree that this is true, but people often cite chess as an example where exponential hardware increases in the same algorithms led to only linear (Elo) gains.
This is people being stupid in one direction. This isn’t a good reason to be stupid in another direction. The simplest explanation is the Elo functions as something like a a log scale of actual ability.
Just to clarify, what do you mean by “actual ability’? In something like the 100 m dash, I can think of “actual ability” as finish time. We could construct an Elo rating based on head-to-head races of thousands of sprinters, and it wouldn’t be a log scale of finish times. Do you just mean percentile in the human distribution?
As I’ve pointed out here before, a lot of the versions of fooming that are discussed here seem to rest on assuming massive software optimization, not just hardware optimization. This runs into strongly believed theoretical comp sci limits such as the likelyhood that P != NP. These issues also come up in hardware design. It may be my own cognitive biases in trying to make something near my field feel useful, but it does seem like this sort of issue is not getting sufficient attention when discussing the probability of AI going foom.
I don’t really see how P vs NP concerns are relevant, frankly.
Moreover, when trying to foom, an AI will probably want to directly improve the algorithms it is using on a regular basis. While not directly a P/NP issue, it is noteworthy that for some common procedures, such as finding gcds, and solving linear programming, we have what are in many ways close to optimal algorithms (the situation is more complicated for linear programing than that but the general point is correct). So the AI probably cannot focus primarily on modifying software because a lot of the basic things that the AI would want to optimize are already close to best possible even for both the average cases and the worst cases.
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference. That is not a problem where we are particularly close to an optimal solution. Rather, solving it looks really, really difficult. Possibly machines will crack the problem and rapidly zoom up to optimal performance. However, that would represent tremendous improvement, not a lack of self-improvement.
So: your concerns about P and NP don’t seem very relevant to me.
The overall upshot of this is that while it is possible that an AI with access to a quantum computer could plausibly foom with most of the focus on software improvement, comp sci issues suggest that an AI trying to foom on a classical system would need to engage in both hardware and software improvement.
Not really. We have more hardware than we know how to use. Some call it a “hardware overhang”. Software improvement alone could take us far, at today’s general tech level. However, of course, faster software improvement leads to faster hardware improvement. Any suggestion that we could conceivably have one without the other seems unwarranted.
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference.
This seems to be highly non-obvious. Even if an AI already had access to a theory of everything, and could engage in near-optimal induction, it isn’t at all clear that this helps much for practical purposes. The most obvious example is the example of cryptography as brought up by Roy. And many other things an AI might want to do seem to simply be computationally intensive by our current methods.
Say for example an AI wants to synthesize a virus to modify some species out there or some members of a species (like say some of those pesky humans). Well, that requires at minimum being able to do protein folding in advance.Similarly, if the AI decides it needs to use its memory more efficiently, that leads to difficult computational tasks.
It may be that we’re focusing on different issues. It seems that you are focusing on “how difficult is inductive inference from a computational perspective?” which is relevant for what sorts of AI we can practically build. That’s not connected to once we have an AI what it will do.
We have more hardware than we know how to use. Some call it a “hardware overhang”. Software improvement alone could take us far, at today’s general tech level.
This seems irrelevant. Hardware overhang is due to the fact that the vast majority of personal clock cycles aren’t being used. The vast majority of that hardware won’t be accessible to our AGI unless something has already gone drastically wrong. I agree that an AGI that can get control of a large fraction of the internet accessible computers will likely quickly get very powerful completely separately from computational complexity questions.
It may be that we are imagining different situations. My intent was primarily to address foom narratives that put much more emphasis on software than improvements in hardware. Moreover, to make the point that without increasing software efficiency, one could easily have diminishing marginal returns in attempts to improve hardware.
The vast majority of that hardware won’t be accessible to our AGI unless something has already gone drastically wrong. I agree that an AGI that can get control of a large fraction of the internet accessible computers will likely quickly get very powerful completely separately from computational complexity questions.
What’s the problem? Google got quite a few people to contribute to Google Compute.
You think that a machine intelligence would be unsuccessful at coming up with better bait for this? Or that attempts to use user cycles are necessarily evil?
You think that a machine intelligence would be unsuccessful at coming up with better bait for this?
Not necessarily. But if the use of such cycles becomes a major necessity for the AI to go fooming that’s still a reason to reduce our estimates that an AI will go foom.
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference.
This seems to be highly non-obvious. Even if an AI already had access to a theory of everything, and could engage in near-optimal induction, it isn’t at all clear that this helps much for practical purposes.
You are overestimating how much we can do just by compression. The key issue is not just the ability to predict accurately but the ability to predict accurately when using limited resources. For example, let A(n) be the Ackermann function and let P(n) be the nth prime number. Then the sequence described by P(A(n)) mod 3 is really compressible. But the time and space resources needed to expand that compressed form is probably massive.
There’s a similar concern here. To again use the protein folding example, even if an AI has a really good model for predicting how proteins will fold if it takes a lot of time to run that model, then it doesn’t do a good job. Similarly, if a pattern rests on the behavior of prime numbers, then the smallest Turing machine which outputs 1 iff a number is prime is probably Euclid’s sieve. But the AKS algorithm which requires a much larger Turing machine will return the data in less time.
What you are saying is correct—but it seems kind-of obvious. Without resource limits, you could compress perfectly, using a variant of Solomonoff induction.
However, to get from there to me overestimating the value of compression, seems like quite a leap. I value stream compression highly because it is the core technical problem of machine intelligence.
I am not claiming that compression ratio is everything, while resource usage is irrelevant. Rather, sensible resource usage is an intrinsic property of a good quality compressor. If you think I am not giving appropriate weight to resource usage, that’s just a communications failure.
I do go on about resource considerations—for example here:
Another issue is that compressors are not just judged by the size of their output. There are other issues to consider—for example, their compression speed, their usage of space, and their energy demands.
If you prefer a different framework think of it this way: compression matters, but for many interesting sequences how much practical compression you can with given time and space constraints depends on complexity theoretic questions.
Where are you getting the impression that this is somehow different from what I thought in the first place?
Or maybe don’t answer that. I would rather you spent your time revisiting the idea that massive software optimization progress runs into strongly believed theoretical computer science limits.
Maybe eventually—but not for a long way yet, allowing for impressive progress in the interim on the central problem of machine intelligence: induction.
Where are you getting the impression that this is somehow different from what I thought in the first place?
I got this from your remark that:
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference. That is not a problem where we are particularly close to an optimal solution. Rather, solving it looks really, really difficult. Possibly machines will crack the problem and rapidly zoom up to optimal performance. However, that would represent tremendous improvement, not a lack of self-improvement.
So: your concerns about P and NP don’t seem very relevant to me.
I would rather you spent your time revisiting the idea that massive software optimization progress runs into strongly believed theoretical computer science limits.
I’m not sure what aspect of this you want me to expand. If the complexity hierarchy doesn’t collapse to a moderate extent (in the sense that P, NP, co-NP, PSPACE, and EXP are all distinct) then many practical problems that come up cannot have that much improvement. Moreover, this can be made more rigorous with slightly stronger assumptions. Thus for example, the exponential time hypothesis is a (somewhat) explicit conjecture about the bounds on solving 3-SAT. If this conjecture is true, then this gives severe limits on how much one can improve both graph coloring, traveling salesman, and Steiner trees, among other practical NP -complete problems. Both these problems show up in real world context, in terms of things like hardware and memory design, while some of these come up directly in predicting the behavior of real world systems.
Moreover, one of the very fruitful set of results in the last few years (albeit an area I must say that I don’t have much knowledge about) has been in taking NP-hard problems and producing approximate solutions or asking how good an approximation one can get. Those results however, have shown that even some old approximation algorithms are asymptotically best possible if P != NP (and require no other assumptions to show that) which include the approximate forms of graph coloring and the set cover problem. Moreover, if one assumes slightly stronger assumptions than P != NP, one gets similar results for other NP problems. It is noteworthy that many of the approximation algorithms in question date back to the 1970s, which suggests that even when finding approximation algorithms one gets close to best possible very quickly.
Another class of problems to look at are problems which are just frequently used. Common examples are finding gcds, multiplying numbers, and the two closely related problems of multiplying matrices and finding determinants.
In the case of gcds, we have near optimal algorithms in strong senses (although it is open whether there’s some way of making gcd calculations parallel, most people suspect that there’s not much improvement possible here).
Multiplying numbers can be done more efficiently than naive multiplication (which is O((log n)^2) for 2 n digits numbers) by using a version of the fast Fourier transform. This may be an example where unexpected speed-ups really do exist in what may be long-standing basic processes. However, there are clear limits to how much this can be improved, since explicit bounds exist for minimum computation required by FFTs (and these bounds exist without explicit constants). While there are still open questions about how efficient one can make FFTs, even if one assumed the most generous possible marginally plausible results, FFTs would not become much more efficient.
In the case of finding determinants, we have good algorithms that are much more efficient than the naive algorithms. For an n x n it follows from not too much linear algebra that one can calculate determinants in O(n^3) arithmetic operations, and better algorithms exist that run around O(n^A) where A is a nasty constant slightly less than 2.4. There’s a clear limit here to how much this can be improved. One cannot go better than O(n^2) because an n x n matrix has n^2 entries. So this is close to the naive lower bound. Not much room for improvement there. The primary room for improvement is that this uses the Coppersmith–Winograd for matrix multiplication, which is the most efficient algorithm for matrix multiplication known, but it isn’t used in practice, because the constants involved make it less efficient than other algorithms in the practical range of matrices (I don’t know precisely where CW exceeds other algorithms I think it is around n = 10^9 or so.) so there’s room for improvement there, but not much room for improvement. And most people who have thought about these issues are confident that determinants cannot be calculated in O(n^2).
So the upshot is that while there’s still work to do, for a lot of these problems, it doesn’t look like there’s room for “massive software optimization”.
Where are you getting the impression that this is somehow different from what I thought in the first place?
I got this from your remark that:
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference. That is not a problem where we are particularly close to an optimal solution. Rather, solving it looks really, really difficult. Possibly machines will crack the problem and rapidly zoom up to optimal performance. However, that would represent tremendous improvement, not a lack of self-improvement.
So: your concerns about P and NP don’t seem very relevant to me.
I would rather you spent your time revisiting the idea that massive software optimization progress runs into strongly believed theoretical computer science limits.
So, when you said: “complexity theoretic questions”, you were actually talking about the possible equivalence of different classes within the complexity hierarchy? That wasn’t obvious to me—I took what you wrote as referring to any old complexity theoretic questions. Such as what language to use when measuring K-complexity. The idea that K-complexity has something to do with compression ratios is what I was calling obvious. A communications breakdown.
OK. FWIW, I don’t need examples of existing algorithms approaching theoretical limits.
As I said, the main problem where I think there is important scope for software improvement is induction. By my estimate, the brain spends about 80% of its resources on induction—so performance on induction seems important.
Current performance on the Hutter prize suggests that a perfect compressor could do about 25% better on the target file than the current champion program does.
So, perhaps it depends on how you measure it. If you think a mere 25% is not much improvement, you may not be too impressed. However, there are a few things to bear in mind:
Induction progress works in a similar manner to the rating scale in go: the higher you climb, the more difficult it is to make further progress.
There’s another similarity to go’s rating scale. In go, God is estimated to be 11-dan—not all that much better than a 9-dan human champion. However, this apparent close proximity to perfection is kind-of an illusion. Play go on bigger boards, and larger margins between humans and God are likely to become apparent. Similarly, measure induction using a more challenging corpus, and today’s programs will not appear to be so close to optimal.
The other thing to bear in mind is that intelligent agents are a major part of the environment of other intelligent agents. This means that it is not very realistic to model a fixed set of environmental problems (TSPs, etc), and to measure intelligence with respect to them. Rather there is an intelligence arms race—with many of the problems which intelligent agents face being posed by other agents.
We can see a related effect in mathematics. Many mathematicians work on the unresolved problems at the boundary of what is known in their field. The more progress they make, the harder the unresolved problems become, and the more intelligence is required to deal with them.
I value stream compression highly because it is the core technical problem of machine intelligence.
We’ll know what the core technical problem of machine intelligence is once we achieve machine intelligence. Achieve it, and then I’ll believe your claim that you know what’s involved in achieving it.
In general, there is no need to perform an engineering feat before you can claim to have understood what problem it involves. We understood the basics of flight before we could ourselves fly. That is also true for machine intelligence today—we have a general theory of intelligence, and can see what the technical side of the problem of building it consists of.
Induction power is the equivalent of lift. Not the only thing you need, but the most central and fundamental element, once you already have a universal computer and storage.
We understood the basics of flight before we could ourselves fly.
We did not understand in the sense of having a correct theory of fluid dynamics. We understood in the sense of having a working model, a paper plane, which actually flew.
Today it is the reverse. We have a theory, but we have no convincing paper brain. We have no working model.
That is also true for machine intelligence today—we have a general theory of intelligence, and can see what the technical side of the problem of building it consists of.
We have the reverse of what ancient China had. They had no theory but they had a paper plane. We have a theory but we have no paper brain.
In general, there is no need to perform an engineering feat before you can claim to have understood what problem it involves.
But with paper planes, we actually performed the essential engineering feat, in paper. The plane flew. But we have no paper brain that thinks.
They may have genuinely different estimates for various probabilities. Don’t be so quick to assume that people who disagree are rationalizing. That’s an easy way to get into a death spiral.
As I’ve pointed out here before, a lot of the versions of fooming that are discussed here seem to rest on assuming massive software optimization, not just hardware optimization. This runs into strongly believed theoretical comp sci limits such as the likelyhood that P != NP. These issues also come up in hardware design. It may be my own cognitive biases in trying to make something near my field feel useful, but it does seem like this sort of issue is not getting sufficient attention when discussing the probability of AI going foom.
Does it? There are certainly situations (breaking encryption) where the problem statement looks something like “I’d like my program to be able to get the single exact solution to this problem in polynomial time”, but for optimization we’re often perfectly happy with “I’d like my program to be able to get close to the exact solution in polynomial time”, or even just “I’d like my program to be able to get a much better solution than people’s previous intuitive guesses”.
Yes, this is one of the issues that does need to be thought about. But there are limits to this. However, there’s been work in the last few years (especially for problems in graph theory) where one can show that being able to find close to optimal solutions is equivalent to being able to find optimal solutions.
The encryption example is also an interesting one, in that many foom scenarios involve the hypothetical AI gaining control over lots of the world’s computer systems. And in fact, no known encryption system rest on a problems which is NP-complete, so for all we know we could have P !=NP in a strong sense and yet still have essentially all encryption be vulnerable. Also, the AI might find security vulnerabilities completely independent of the math due to implementation issues.
Another reason that P != NP might not be a severe barrier that cousin_it pointed out is that our AI would be unlikely to need to solve general NP-complete problems, just real-world instances which are likely to have additional regularity in them. Moreover, for most NP-complete problems, random instances of the problems are generally easy (indeed, this is part of what went wrong in Deolalikar’s attempted proof. The general behavior on aggregate of NP-complete problems like K-sat for fixed k>=3 looks a lot like 2-SAT which is in P). So something trying to foom might not even run into such instances when trying to engage in practical applications of NP hard problems (such as where say graph coloring comes in optimizing memory allocation.) And there are some NP hard problems where one really does care about some restricted form of the problem. One example of this is protein folding where it seems that most models of protein folding are at least NP hard.
However, when trying to foom, an AI will probably want to directly improve the algorithms it is using on a regular basis. While not directly a P/NP issue, it is noteworthy that for some common procedures, such as finding gcds, and solving linear programming, we have what are in many ways close to optimal algorithms (the situation is more complicated for linear programing than that but the general point is correct). So the AI probably cannot focus primarily on modifying software because a lot of the basic things that the AI would want to optimize are already close to best possible even for both the average cases and the worst cases.
There are also other problems with this sort of thing. So for example, it might turn out that P != NP but that there’s a general NP-complete solver that runs in almost polynomial time with a really small constant in the Big-Oh. There’s as I understand it some very weak results showing that this can’t happen for some functions very close to polynomial growth.
There are other things that could go wrong. If for example, quantum computers are much more powerful than classical systems then the AI could if it had access to one just side-step almost all of these issues. We can’t for example even show that BQP is contained in NP. If BQP is larger than anticipated then this is a serious problem. Moreover, even if P !=NP in a strong sense, it is plausible that there’s a quantum algorithm which can solve NP-complete problems in practically close to polynomial time (this is more plausible than the similar idea in the previous paragraph about the classical case). So, if we end up with practical quantum computing, running an AI on it would be a bad idea.
There are also a handful of more exotic possibilities. Scott Aaaronson has discussed using minature wormholes to aid computation. And if you can do this, then all of PSPACE collapses. From the perspective of an AI trying to foom, this is good. From our perspective, probably not. The good news here is that the tech level required is probably well beyond anything we have and so no AI would probably have access to this sort of thing unless it had already foomed or was so close to fooming that it makes no difference.
The overall upshot of this is that while it is possible that an AI with access to a quantum computer could plausibly foom with most of the focus on software improvement, comp sci issues suggest that an AI trying to foom on a classical system would need to engage in both hardware and software improvement. Trying to improve your hardware is tough without either prior access to molecular nanotech(which is its own can of nastiness regardless of whether humans or AI have access) or a lot of cooperative humans who have given one access to circuit board and chip manufacturing facilities. Moreover, hardware generally produces diminishing marginal returns as long as the hardware remains the same type (i.e. just improving on classical computers) and could run into its own problems as design issues themselves involve problems which are computationally complex (although cousin_it again has pointed out that this could be less of a problem than one might think since there are regularities that might show up).
I will be able to sleep better at night once we’ve proven that P != NP.
(Ok. There’s a lot of material here, and some of it probably should get expanded. How would people feel about expanding this into a top-level post with nice citations, better organization, and all that?)
I see a bit of a disconnect here from historical algorithmic improvements. In the last five decades humans have created algorithms for solving many problems that had previously been intractable, and given orders of magnitude improvement on others. Many of these have come from math/compsci innovation that was not particularly hardware-limited, i.e. if you had the same (or a larger/smarter-on-average/better-organized) research community but with frozen primitive hardware many of the insights would have been found.
At the moment there are some problems for which we have near-optimal algorithms, where we can show that near-optimal algorithms are out of reach but further performance improvements are unlikely. There are also problems where we are clearly far from the reachable frontier (whether that is near-optimal performance, or just the best that can be done given resource constraints).
The huge swathe of skills wielded by humans but not by existing AI systems shows that in terms of behavioral capabilities there is a lot of room for growth in capacity that does not depend on outperforming the algorithms where we have near-optimal methods (or optimal under resource constraint). The fact that we are the first species on Earth to reach civilization-supporting levels of cognitive capacities, suggests that there is room to grow beyond that in terms of useful behavioral capacities (which may be produced using various practical strategies that involve different computational problems) before hitting the frontier of feasibility. So long as enough domains have room to grow, they can translate into strategic advantage even if others are stable.
Also, I would note that linear performance gains on one measure can lead to much greater gains on another, e.g. linear improvements in predicting movements in financial markets translate to exponential wealth gains, gains in social manipulation or strategic acumen give disproportionate returns when they enable one to reliably outmaneuver one’s opposition, linear gains in chess performance translate into an exponential drop-off in the number of potential human challengers, etc.
Yes. I agree strongly with this. One major thing we’ve found in the last few years is just that P turns out to be large and a lot of problems have turned out to be in there that were not obviously so. If one asked people in the early 1970s whether they would expect primality testing to be in P they would probably say no. Moreover, some practical problems have simply had their implementations improved a lot even as the space and time of the algorithms remain in the same complexity classes.
Can you expand on this? I’m not sure I follow.
Sure. But that doesn’t say much about how fast that growth will occur. The standard hard-take off narratives have the AI becoming functionally in control of its light cone in a matter of hours or at most weeks. I agree that there is likely a lot of room for improvement in cognitive capability but the issue in this context is whether it is likely for that sort of improvement to occur quickly.
I agree with your other examples. And it is a valid point. I don’t think that a strong form P !=NP makes fooming impossible, just that it makes it much less likely. The chess example however has an issue that needs to be nitpicked. As I understand it, this isn’t really about linear gains in chess translating into exponential drop off but rather an artifact of the Elo system which sort of requires that linear increase corresponds to quick improvement.
The human field of AI is about half a million hours old, computer elements can operate at a million times human speed (given enough parallel elements). To the extent that many of the important discoveries were not limited by chip speeds but by the pace of CS, math, and AI researchers’ thinking (with most of the work done by some thousands of people who spent much of that time eating, sleeping, goofing off, getting up to speed on existing knowledge in the field).
With a big fast hardware base (relative to the program) and AI sophisticated enough to keep learning without continual human guidance and grok AI theory, gains comparable to the history of AI so far in a few hours or weeks would be reasonable from speedup alone.
I agree that one could have scenarios in which there are AI programs with humanlike capacities that are not yet capable of such development (e.g. a super-bloated system running on massive server farms). However, they tend to involve AI development happening very surprisingly quickly, and don’t seem stable for long (bloated implementations can be made more efficient, with strong positive feedback in the improvement, and superhuman hardware will come soon after powerful AI if not before).
I agree that this is true, but people often cite chess as an example where exponential hardware increases in the same algorithms led to only linear (Elo) gains.
Sure. But the end result of all that might end up be very small improvements in actual algorithmic efficiency. It might turn out for example that the best factoring algorithms are of the same order as the current sieves, and it might turn out that after thousands of additional hours of comp sci work the end result is a very difficult proof of that. If the complexity hierarchy doesn’t collapse in a strong sense, then even with lots of resources to spend just thinking about algorithms, the AI won’t improve the algorithms by that much in terms of actual speed, because they can’t be.
Yes, I agreed that we should expect this on some problems, but that we don’t have reason to expect it across most problems, weighted by practical impact. Especially so for the specific skills where humans greatly outperform computers, skills with great relevance for strategic advantage.
Do you think we have much reason to expect that the algorithms underlying human performance (in the problems where humans greatly outperform today’s AI) are mostly near optimal at what they do, such that AIs won’t have any areas of huge advantage to leverage?
I agree with the human skills. I disagree with the claim for problems by practical impact. For example, many practical problems turn out in the general cases to be NP hard or NP complete, or are believed to be not solvable in polynomial time. Examples include the traveling salesman and graph coloring both of which come up very frequently in practical applications across a wide range of contexts.
Many of those algorithms might be able to be optimized a lot. There’s an argument that we should expect humans to be near optimal (since we’ve spent a million years evolving to be really good at face recognition, understanding other human minds etc.) and our neural nets are trained from a very young age to do this. But there’s a lot of evidence that we are in fact suboptimal. Evidence for this includes Dunbar’s number and a lot of classical cognitive biases such as the illusion of transparency.
But a lot of those aren’t that relevant to fooming. Most humans can do facial recognition pretty fast and pretty reliably. If an AI can do that with a much tinier set of resources, more quickly and more reliably, that’s really neat but that isn’t going to help it go foom.
I’m not sure how to interpret what you’re saying. You say:
which sounds to me like a summary of long experience. But you also seem to be talking about a scenario which you cannot possibly have experienced even once. So, I’m not sure what you’re saying.
I’m saying that in my experience of people working out consistent scenarios that involve AI development with sustained scarcity, the scenarios offered usually involve the development of human-level AI early, before hardware can advance much further.
Also, regarding
This is people being stupid in one direction. This isn’t a good reason to be stupid in another direction. The simplest explanation is the Elo functions as something like a a log scale of actual ability.
Just to clarify, what do you mean by “actual ability’? In something like the 100 m dash, I can think of “actual ability” as finish time. We could construct an Elo rating based on head-to-head races of thousands of sprinters, and it wouldn’t be a log scale of finish times. Do you just mean percentile in the human distribution?
Yes (within some approximation. Weird things happen for very large or very small Elo values.)
I don’t really see how P vs NP concerns are relevant, frankly.
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference. That is not a problem where we are particularly close to an optimal solution. Rather, solving it looks really, really difficult. Possibly machines will crack the problem and rapidly zoom up to optimal performance. However, that would represent tremendous improvement, not a lack of self-improvement.
So: your concerns about P and NP don’t seem very relevant to me.
Not really. We have more hardware than we know how to use. Some call it a “hardware overhang”. Software improvement alone could take us far, at today’s general tech level. However, of course, faster software improvement leads to faster hardware improvement. Any suggestion that we could conceivably have one without the other seems unwarranted.
This seems to be highly non-obvious. Even if an AI already had access to a theory of everything, and could engage in near-optimal induction, it isn’t at all clear that this helps much for practical purposes. The most obvious example is the example of cryptography as brought up by Roy. And many other things an AI might want to do seem to simply be computationally intensive by our current methods.
Say for example an AI wants to synthesize a virus to modify some species out there or some members of a species (like say some of those pesky humans). Well, that requires at minimum being able to do protein folding in advance.Similarly, if the AI decides it needs to use its memory more efficiently, that leads to difficult computational tasks.
It may be that we’re focusing on different issues. It seems that you are focusing on “how difficult is inductive inference from a computational perspective?” which is relevant for what sorts of AI we can practically build. That’s not connected to once we have an AI what it will do.
This seems irrelevant. Hardware overhang is due to the fact that the vast majority of personal clock cycles aren’t being used. The vast majority of that hardware won’t be accessible to our AGI unless something has already gone drastically wrong. I agree that an AGI that can get control of a large fraction of the internet accessible computers will likely quickly get very powerful completely separately from computational complexity questions.
It may be that we are imagining different situations. My intent was primarily to address foom narratives that put much more emphasis on software than improvements in hardware. Moreover, to make the point that without increasing software efficiency, one could easily have diminishing marginal returns in attempts to improve hardware.
What’s the problem? Google got quite a few people to contribute to Google Compute.
You think that a machine intelligence would be unsuccessful at coming up with better bait for this? Or that attempts to use user cycles are necessarily evil?
Not necessarily. But if the use of such cycles becomes a major necessity for the AI to go fooming that’s still a reason to reduce our estimates that an AI will go foom.
Not obvious, perhaps, but surely pretty accurate.
Hutter: http://prize.hutter1.net/hfaq.htm#compai
Mahoney: http://cs.fit.edu/~mmahoney/compression/rationale.html
Tyler—Part 2 on: http://matchingpennies.com/machine_forecasting/
FWIW, a theory of everything is not required—induction is performed on sense-data, or preprocessed sense data.
You are overestimating how much we can do just by compression. The key issue is not just the ability to predict accurately but the ability to predict accurately when using limited resources. For example, let A(n) be the Ackermann function and let P(n) be the nth prime number. Then the sequence described by P(A(n)) mod 3 is really compressible. But the time and space resources needed to expand that compressed form is probably massive.
There’s a similar concern here. To again use the protein folding example, even if an AI has a really good model for predicting how proteins will fold if it takes a lot of time to run that model, then it doesn’t do a good job. Similarly, if a pattern rests on the behavior of prime numbers, then the smallest Turing machine which outputs 1 iff a number is prime is probably Euclid’s sieve. But the AKS algorithm which requires a much larger Turing machine will return the data in less time.
What you are saying is correct—but it seems kind-of obvious. Without resource limits, you could compress perfectly, using a variant of Solomonoff induction.
However, to get from there to me overestimating the value of compression, seems like quite a leap. I value stream compression highly because it is the core technical problem of machine intelligence.
I am not claiming that compression ratio is everything, while resource usage is irrelevant. Rather, sensible resource usage is an intrinsic property of a good quality compressor. If you think I am not giving appropriate weight to resource usage, that’s just a communications failure.
I do go on about resource considerations—for example here:
If you prefer a different framework think of it this way: compression matters, but for many interesting sequences how much practical compression you can with given time and space constraints depends on complexity theoretic questions.
Again, isn’t what you are saying simply obvious?
Where are you getting the impression that this is somehow different from what I thought in the first place?
Or maybe don’t answer that. I would rather you spent your time revisiting the idea that massive software optimization progress runs into strongly believed theoretical computer science limits.
Maybe eventually—but not for a long way yet, allowing for impressive progress in the interim on the central problem of machine intelligence: induction.
I got this from your remark that:
I’m not sure what aspect of this you want me to expand. If the complexity hierarchy doesn’t collapse to a moderate extent (in the sense that P, NP, co-NP, PSPACE, and EXP are all distinct) then many practical problems that come up cannot have that much improvement. Moreover, this can be made more rigorous with slightly stronger assumptions. Thus for example, the exponential time hypothesis is a (somewhat) explicit conjecture about the bounds on solving 3-SAT. If this conjecture is true, then this gives severe limits on how much one can improve both graph coloring, traveling salesman, and Steiner trees, among other practical NP -complete problems. Both these problems show up in real world context, in terms of things like hardware and memory design, while some of these come up directly in predicting the behavior of real world systems.
Moreover, one of the very fruitful set of results in the last few years (albeit an area I must say that I don’t have much knowledge about) has been in taking NP-hard problems and producing approximate solutions or asking how good an approximation one can get. Those results however, have shown that even some old approximation algorithms are asymptotically best possible if P != NP (and require no other assumptions to show that) which include the approximate forms of graph coloring and the set cover problem. Moreover, if one assumes slightly stronger assumptions than P != NP, one gets similar results for other NP problems. It is noteworthy that many of the approximation algorithms in question date back to the 1970s, which suggests that even when finding approximation algorithms one gets close to best possible very quickly.
Another class of problems to look at are problems which are just frequently used. Common examples are finding gcds, multiplying numbers, and the two closely related problems of multiplying matrices and finding determinants.
In the case of gcds, we have near optimal algorithms in strong senses (although it is open whether there’s some way of making gcd calculations parallel, most people suspect that there’s not much improvement possible here).
Multiplying numbers can be done more efficiently than naive multiplication (which is O((log n)^2) for 2 n digits numbers) by using a version of the fast Fourier transform. This may be an example where unexpected speed-ups really do exist in what may be long-standing basic processes. However, there are clear limits to how much this can be improved, since explicit bounds exist for minimum computation required by FFTs (and these bounds exist without explicit constants). While there are still open questions about how efficient one can make FFTs, even if one assumed the most generous possible marginally plausible results, FFTs would not become much more efficient.
In the case of finding determinants, we have good algorithms that are much more efficient than the naive algorithms. For an n x n it follows from not too much linear algebra that one can calculate determinants in O(n^3) arithmetic operations, and better algorithms exist that run around O(n^A) where A is a nasty constant slightly less than 2.4. There’s a clear limit here to how much this can be improved. One cannot go better than O(n^2) because an n x n matrix has n^2 entries. So this is close to the naive lower bound. Not much room for improvement there. The primary room for improvement is that this uses the Coppersmith–Winograd for matrix multiplication, which is the most efficient algorithm for matrix multiplication known, but it isn’t used in practice, because the constants involved make it less efficient than other algorithms in the practical range of matrices (I don’t know precisely where CW exceeds other algorithms I think it is around n = 10^9 or so.) so there’s room for improvement there, but not much room for improvement. And most people who have thought about these issues are confident that determinants cannot be calculated in O(n^2).
So the upshot is that while there’s still work to do, for a lot of these problems, it doesn’t look like there’s room for “massive software optimization”.
So, when you said: “complexity theoretic questions”, you were actually talking about the possible equivalence of different classes within the complexity hierarchy? That wasn’t obvious to me—I took what you wrote as referring to any old complexity theoretic questions. Such as what language to use when measuring K-complexity. The idea that K-complexity has something to do with compression ratios is what I was calling obvious. A communications breakdown.
Yes, with some apparent illusion of transparency too.
OK. FWIW, I don’t need examples of existing algorithms approaching theoretical limits.
As I said, the main problem where I think there is important scope for software improvement is induction. By my estimate, the brain spends about 80% of its resources on induction—so performance on induction seems important.
Current performance on the Hutter prize suggests that a perfect compressor could do about 25% better on the target file than the current champion program does.
So, perhaps it depends on how you measure it. If you think a mere 25% is not much improvement, you may not be too impressed. However, there are a few things to bear in mind:
Induction progress works in a similar manner to the rating scale in go: the higher you climb, the more difficult it is to make further progress.
There’s another similarity to go’s rating scale. In go, God is estimated to be 11-dan—not all that much better than a 9-dan human champion. However, this apparent close proximity to perfection is kind-of an illusion. Play go on bigger boards, and larger margins between humans and God are likely to become apparent. Similarly, measure induction using a more challenging corpus, and today’s programs will not appear to be so close to optimal.
The other thing to bear in mind is that intelligent agents are a major part of the environment of other intelligent agents. This means that it is not very realistic to model a fixed set of environmental problems (TSPs, etc), and to measure intelligence with respect to them. Rather there is an intelligence arms race—with many of the problems which intelligent agents face being posed by other agents.
We can see a related effect in mathematics. Many mathematicians work on the unresolved problems at the boundary of what is known in their field. The more progress they make, the harder the unresolved problems become, and the more intelligence is required to deal with them.
We’ll know what the core technical problem of machine intelligence is once we achieve machine intelligence. Achieve it, and then I’ll believe your claim that you know what’s involved in achieving it.
IMO, we can clearly see what it is now.
In general, there is no need to perform an engineering feat before you can claim to have understood what problem it involves. We understood the basics of flight before we could ourselves fly. That is also true for machine intelligence today—we have a general theory of intelligence, and can see what the technical side of the problem of building it consists of.
Induction power is the equivalent of lift. Not the only thing you need, but the most central and fundamental element, once you already have a universal computer and storage.
We did not understand in the sense of having a correct theory of fluid dynamics. We understood in the sense of having a working model, a paper plane, which actually flew.
Today it is the reverse. We have a theory, but we have no convincing paper brain. We have no working model.
We have the reverse of what ancient China had. They had no theory but they had a paper plane. We have a theory but we have no paper brain.
But with paper planes, we actually performed the essential engineering feat, in paper. The plane flew. But we have no paper brain that thinks.
IMO, we do have the machine-intelligence equivalent of paper planes.
They come in two main forms: forecasters and expert systems.
What we need is more heavy lifting.
I like that notion, but I’m not a mathematician.