As I’ve pointed out here before, a lot of the versions of fooming that are discussed here seem to rest on assuming massive software optimization, not just hardware optimization. This runs into strongly believed theoretical comp sci limits such as the likelyhood that P != NP. These issues also come up in hardware design. It may be my own cognitive biases in trying to make something near my field feel useful, but it does seem like this sort of issue is not getting sufficient attention when discussing the probability of AI going foom.
I don’t really see how P vs NP concerns are relevant, frankly.
Moreover, when trying to foom, an AI will probably want to directly improve the algorithms it is using on a regular basis. While not directly a P/NP issue, it is noteworthy that for some common procedures, such as finding gcds, and solving linear programming, we have what are in many ways close to optimal algorithms (the situation is more complicated for linear programing than that but the general point is correct). So the AI probably cannot focus primarily on modifying software because a lot of the basic things that the AI would want to optimize are already close to best possible even for both the average cases and the worst cases.
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference. That is not a problem where we are particularly close to an optimal solution. Rather, solving it looks really, really difficult. Possibly machines will crack the problem and rapidly zoom up to optimal performance. However, that would represent tremendous improvement, not a lack of self-improvement.
So: your concerns about P and NP don’t seem very relevant to me.
The overall upshot of this is that while it is possible that an AI with access to a quantum computer could plausibly foom with most of the focus on software improvement, comp sci issues suggest that an AI trying to foom on a classical system would need to engage in both hardware and software improvement.
Not really. We have more hardware than we know how to use. Some call it a “hardware overhang”. Software improvement alone could take us far, at today’s general tech level. However, of course, faster software improvement leads to faster hardware improvement. Any suggestion that we could conceivably have one without the other seems unwarranted.
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference.
This seems to be highly non-obvious. Even if an AI already had access to a theory of everything, and could engage in near-optimal induction, it isn’t at all clear that this helps much for practical purposes. The most obvious example is the example of cryptography as brought up by Roy. And many other things an AI might want to do seem to simply be computationally intensive by our current methods.
Say for example an AI wants to synthesize a virus to modify some species out there or some members of a species (like say some of those pesky humans). Well, that requires at minimum being able to do protein folding in advance.Similarly, if the AI decides it needs to use its memory more efficiently, that leads to difficult computational tasks.
It may be that we’re focusing on different issues. It seems that you are focusing on “how difficult is inductive inference from a computational perspective?” which is relevant for what sorts of AI we can practically build. That’s not connected to once we have an AI what it will do.
We have more hardware than we know how to use. Some call it a “hardware overhang”. Software improvement alone could take us far, at today’s general tech level.
This seems irrelevant. Hardware overhang is due to the fact that the vast majority of personal clock cycles aren’t being used. The vast majority of that hardware won’t be accessible to our AGI unless something has already gone drastically wrong. I agree that an AGI that can get control of a large fraction of the internet accessible computers will likely quickly get very powerful completely separately from computational complexity questions.
It may be that we are imagining different situations. My intent was primarily to address foom narratives that put much more emphasis on software than improvements in hardware. Moreover, to make the point that without increasing software efficiency, one could easily have diminishing marginal returns in attempts to improve hardware.
The vast majority of that hardware won’t be accessible to our AGI unless something has already gone drastically wrong. I agree that an AGI that can get control of a large fraction of the internet accessible computers will likely quickly get very powerful completely separately from computational complexity questions.
What’s the problem? Google got quite a few people to contribute to Google Compute.
You think that a machine intelligence would be unsuccessful at coming up with better bait for this? Or that attempts to use user cycles are necessarily evil?
You think that a machine intelligence would be unsuccessful at coming up with better bait for this?
Not necessarily. But if the use of such cycles becomes a major necessity for the AI to go fooming that’s still a reason to reduce our estimates that an AI will go foom.
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference.
This seems to be highly non-obvious. Even if an AI already had access to a theory of everything, and could engage in near-optimal induction, it isn’t at all clear that this helps much for practical purposes.
You are overestimating how much we can do just by compression. The key issue is not just the ability to predict accurately but the ability to predict accurately when using limited resources. For example, let A(n) be the Ackermann function and let P(n) be the nth prime number. Then the sequence described by P(A(n)) mod 3 is really compressible. But the time and space resources needed to expand that compressed form is probably massive.
There’s a similar concern here. To again use the protein folding example, even if an AI has a really good model for predicting how proteins will fold if it takes a lot of time to run that model, then it doesn’t do a good job. Similarly, if a pattern rests on the behavior of prime numbers, then the smallest Turing machine which outputs 1 iff a number is prime is probably Euclid’s sieve. But the AKS algorithm which requires a much larger Turing machine will return the data in less time.
What you are saying is correct—but it seems kind-of obvious. Without resource limits, you could compress perfectly, using a variant of Solomonoff induction.
However, to get from there to me overestimating the value of compression, seems like quite a leap. I value stream compression highly because it is the core technical problem of machine intelligence.
I am not claiming that compression ratio is everything, while resource usage is irrelevant. Rather, sensible resource usage is an intrinsic property of a good quality compressor. If you think I am not giving appropriate weight to resource usage, that’s just a communications failure.
I do go on about resource considerations—for example here:
Another issue is that compressors are not just judged by the size of their output. There are other issues to consider—for example, their compression speed, their usage of space, and their energy demands.
If you prefer a different framework think of it this way: compression matters, but for many interesting sequences how much practical compression you can with given time and space constraints depends on complexity theoretic questions.
Where are you getting the impression that this is somehow different from what I thought in the first place?
Or maybe don’t answer that. I would rather you spent your time revisiting the idea that massive software optimization progress runs into strongly believed theoretical computer science limits.
Maybe eventually—but not for a long way yet, allowing for impressive progress in the interim on the central problem of machine intelligence: induction.
Where are you getting the impression that this is somehow different from what I thought in the first place?
I got this from your remark that:
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference. That is not a problem where we are particularly close to an optimal solution. Rather, solving it looks really, really difficult. Possibly machines will crack the problem and rapidly zoom up to optimal performance. However, that would represent tremendous improvement, not a lack of self-improvement.
So: your concerns about P and NP don’t seem very relevant to me.
I would rather you spent your time revisiting the idea that massive software optimization progress runs into strongly believed theoretical computer science limits.
I’m not sure what aspect of this you want me to expand. If the complexity hierarchy doesn’t collapse to a moderate extent (in the sense that P, NP, co-NP, PSPACE, and EXP are all distinct) then many practical problems that come up cannot have that much improvement. Moreover, this can be made more rigorous with slightly stronger assumptions. Thus for example, the exponential time hypothesis is a (somewhat) explicit conjecture about the bounds on solving 3-SAT. If this conjecture is true, then this gives severe limits on how much one can improve both graph coloring, traveling salesman, and Steiner trees, among other practical NP -complete problems. Both these problems show up in real world context, in terms of things like hardware and memory design, while some of these come up directly in predicting the behavior of real world systems.
Moreover, one of the very fruitful set of results in the last few years (albeit an area I must say that I don’t have much knowledge about) has been in taking NP-hard problems and producing approximate solutions or asking how good an approximation one can get. Those results however, have shown that even some old approximation algorithms are asymptotically best possible if P != NP (and require no other assumptions to show that) which include the approximate forms of graph coloring and the set cover problem. Moreover, if one assumes slightly stronger assumptions than P != NP, one gets similar results for other NP problems. It is noteworthy that many of the approximation algorithms in question date back to the 1970s, which suggests that even when finding approximation algorithms one gets close to best possible very quickly.
Another class of problems to look at are problems which are just frequently used. Common examples are finding gcds, multiplying numbers, and the two closely related problems of multiplying matrices and finding determinants.
In the case of gcds, we have near optimal algorithms in strong senses (although it is open whether there’s some way of making gcd calculations parallel, most people suspect that there’s not much improvement possible here).
Multiplying numbers can be done more efficiently than naive multiplication (which is O((log n)^2) for 2 n digits numbers) by using a version of the fast Fourier transform. This may be an example where unexpected speed-ups really do exist in what may be long-standing basic processes. However, there are clear limits to how much this can be improved, since explicit bounds exist for minimum computation required by FFTs (and these bounds exist without explicit constants). While there are still open questions about how efficient one can make FFTs, even if one assumed the most generous possible marginally plausible results, FFTs would not become much more efficient.
In the case of finding determinants, we have good algorithms that are much more efficient than the naive algorithms. For an n x n it follows from not too much linear algebra that one can calculate determinants in O(n^3) arithmetic operations, and better algorithms exist that run around O(n^A) where A is a nasty constant slightly less than 2.4. There’s a clear limit here to how much this can be improved. One cannot go better than O(n^2) because an n x n matrix has n^2 entries. So this is close to the naive lower bound. Not much room for improvement there. The primary room for improvement is that this uses the Coppersmith–Winograd for matrix multiplication, which is the most efficient algorithm for matrix multiplication known, but it isn’t used in practice, because the constants involved make it less efficient than other algorithms in the practical range of matrices (I don’t know precisely where CW exceeds other algorithms I think it is around n = 10^9 or so.) so there’s room for improvement there, but not much room for improvement. And most people who have thought about these issues are confident that determinants cannot be calculated in O(n^2).
So the upshot is that while there’s still work to do, for a lot of these problems, it doesn’t look like there’s room for “massive software optimization”.
Where are you getting the impression that this is somehow different from what I thought in the first place?
I got this from your remark that:
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference. That is not a problem where we are particularly close to an optimal solution. Rather, solving it looks really, really difficult. Possibly machines will crack the problem and rapidly zoom up to optimal performance. However, that would represent tremendous improvement, not a lack of self-improvement.
So: your concerns about P and NP don’t seem very relevant to me.
I would rather you spent your time revisiting the idea that massive software optimization progress runs into strongly believed theoretical computer science limits.
So, when you said: “complexity theoretic questions”, you were actually talking about the possible equivalence of different classes within the complexity hierarchy? That wasn’t obvious to me—I took what you wrote as referring to any old complexity theoretic questions. Such as what language to use when measuring K-complexity. The idea that K-complexity has something to do with compression ratios is what I was calling obvious. A communications breakdown.
OK. FWIW, I don’t need examples of existing algorithms approaching theoretical limits.
As I said, the main problem where I think there is important scope for software improvement is induction. By my estimate, the brain spends about 80% of its resources on induction—so performance on induction seems important.
Current performance on the Hutter prize suggests that a perfect compressor could do about 25% better on the target file than the current champion program does.
So, perhaps it depends on how you measure it. If you think a mere 25% is not much improvement, you may not be too impressed. However, there are a few things to bear in mind:
Induction progress works in a similar manner to the rating scale in go: the higher you climb, the more difficult it is to make further progress.
There’s another similarity to go’s rating scale. In go, God is estimated to be 11-dan—not all that much better than a 9-dan human champion. However, this apparent close proximity to perfection is kind-of an illusion. Play go on bigger boards, and larger margins between humans and God are likely to become apparent. Similarly, measure induction using a more challenging corpus, and today’s programs will not appear to be so close to optimal.
The other thing to bear in mind is that intelligent agents are a major part of the environment of other intelligent agents. This means that it is not very realistic to model a fixed set of environmental problems (TSPs, etc), and to measure intelligence with respect to them. Rather there is an intelligence arms race—with many of the problems which intelligent agents face being posed by other agents.
We can see a related effect in mathematics. Many mathematicians work on the unresolved problems at the boundary of what is known in their field. The more progress they make, the harder the unresolved problems become, and the more intelligence is required to deal with them.
I value stream compression highly because it is the core technical problem of machine intelligence.
We’ll know what the core technical problem of machine intelligence is once we achieve machine intelligence. Achieve it, and then I’ll believe your claim that you know what’s involved in achieving it.
In general, there is no need to perform an engineering feat before you can claim to have understood what problem it involves. We understood the basics of flight before we could ourselves fly. That is also true for machine intelligence today—we have a general theory of intelligence, and can see what the technical side of the problem of building it consists of.
Induction power is the equivalent of lift. Not the only thing you need, but the most central and fundamental element, once you already have a universal computer and storage.
We understood the basics of flight before we could ourselves fly.
We did not understand in the sense of having a correct theory of fluid dynamics. We understood in the sense of having a working model, a paper plane, which actually flew.
Today it is the reverse. We have a theory, but we have no convincing paper brain. We have no working model.
That is also true for machine intelligence today—we have a general theory of intelligence, and can see what the technical side of the problem of building it consists of.
We have the reverse of what ancient China had. They had no theory but they had a paper plane. We have a theory but we have no paper brain.
In general, there is no need to perform an engineering feat before you can claim to have understood what problem it involves.
But with paper planes, we actually performed the essential engineering feat, in paper. The plane flew. But we have no paper brain that thinks.
I don’t really see how P vs NP concerns are relevant, frankly.
The core technical problem of machine intelligence is building an agent that correctly performs inductive inference. That is not a problem where we are particularly close to an optimal solution. Rather, solving it looks really, really difficult. Possibly machines will crack the problem and rapidly zoom up to optimal performance. However, that would represent tremendous improvement, not a lack of self-improvement.
So: your concerns about P and NP don’t seem very relevant to me.
Not really. We have more hardware than we know how to use. Some call it a “hardware overhang”. Software improvement alone could take us far, at today’s general tech level. However, of course, faster software improvement leads to faster hardware improvement. Any suggestion that we could conceivably have one without the other seems unwarranted.
This seems to be highly non-obvious. Even if an AI already had access to a theory of everything, and could engage in near-optimal induction, it isn’t at all clear that this helps much for practical purposes. The most obvious example is the example of cryptography as brought up by Roy. And many other things an AI might want to do seem to simply be computationally intensive by our current methods.
Say for example an AI wants to synthesize a virus to modify some species out there or some members of a species (like say some of those pesky humans). Well, that requires at minimum being able to do protein folding in advance.Similarly, if the AI decides it needs to use its memory more efficiently, that leads to difficult computational tasks.
It may be that we’re focusing on different issues. It seems that you are focusing on “how difficult is inductive inference from a computational perspective?” which is relevant for what sorts of AI we can practically build. That’s not connected to once we have an AI what it will do.
This seems irrelevant. Hardware overhang is due to the fact that the vast majority of personal clock cycles aren’t being used. The vast majority of that hardware won’t be accessible to our AGI unless something has already gone drastically wrong. I agree that an AGI that can get control of a large fraction of the internet accessible computers will likely quickly get very powerful completely separately from computational complexity questions.
It may be that we are imagining different situations. My intent was primarily to address foom narratives that put much more emphasis on software than improvements in hardware. Moreover, to make the point that without increasing software efficiency, one could easily have diminishing marginal returns in attempts to improve hardware.
What’s the problem? Google got quite a few people to contribute to Google Compute.
You think that a machine intelligence would be unsuccessful at coming up with better bait for this? Or that attempts to use user cycles are necessarily evil?
Not necessarily. But if the use of such cycles becomes a major necessity for the AI to go fooming that’s still a reason to reduce our estimates that an AI will go foom.
Not obvious, perhaps, but surely pretty accurate.
Hutter: http://prize.hutter1.net/hfaq.htm#compai
Mahoney: http://cs.fit.edu/~mmahoney/compression/rationale.html
Tyler—Part 2 on: http://matchingpennies.com/machine_forecasting/
FWIW, a theory of everything is not required—induction is performed on sense-data, or preprocessed sense data.
You are overestimating how much we can do just by compression. The key issue is not just the ability to predict accurately but the ability to predict accurately when using limited resources. For example, let A(n) be the Ackermann function and let P(n) be the nth prime number. Then the sequence described by P(A(n)) mod 3 is really compressible. But the time and space resources needed to expand that compressed form is probably massive.
There’s a similar concern here. To again use the protein folding example, even if an AI has a really good model for predicting how proteins will fold if it takes a lot of time to run that model, then it doesn’t do a good job. Similarly, if a pattern rests on the behavior of prime numbers, then the smallest Turing machine which outputs 1 iff a number is prime is probably Euclid’s sieve. But the AKS algorithm which requires a much larger Turing machine will return the data in less time.
What you are saying is correct—but it seems kind-of obvious. Without resource limits, you could compress perfectly, using a variant of Solomonoff induction.
However, to get from there to me overestimating the value of compression, seems like quite a leap. I value stream compression highly because it is the core technical problem of machine intelligence.
I am not claiming that compression ratio is everything, while resource usage is irrelevant. Rather, sensible resource usage is an intrinsic property of a good quality compressor. If you think I am not giving appropriate weight to resource usage, that’s just a communications failure.
I do go on about resource considerations—for example here:
If you prefer a different framework think of it this way: compression matters, but for many interesting sequences how much practical compression you can with given time and space constraints depends on complexity theoretic questions.
Again, isn’t what you are saying simply obvious?
Where are you getting the impression that this is somehow different from what I thought in the first place?
Or maybe don’t answer that. I would rather you spent your time revisiting the idea that massive software optimization progress runs into strongly believed theoretical computer science limits.
Maybe eventually—but not for a long way yet, allowing for impressive progress in the interim on the central problem of machine intelligence: induction.
I got this from your remark that:
I’m not sure what aspect of this you want me to expand. If the complexity hierarchy doesn’t collapse to a moderate extent (in the sense that P, NP, co-NP, PSPACE, and EXP are all distinct) then many practical problems that come up cannot have that much improvement. Moreover, this can be made more rigorous with slightly stronger assumptions. Thus for example, the exponential time hypothesis is a (somewhat) explicit conjecture about the bounds on solving 3-SAT. If this conjecture is true, then this gives severe limits on how much one can improve both graph coloring, traveling salesman, and Steiner trees, among other practical NP -complete problems. Both these problems show up in real world context, in terms of things like hardware and memory design, while some of these come up directly in predicting the behavior of real world systems.
Moreover, one of the very fruitful set of results in the last few years (albeit an area I must say that I don’t have much knowledge about) has been in taking NP-hard problems and producing approximate solutions or asking how good an approximation one can get. Those results however, have shown that even some old approximation algorithms are asymptotically best possible if P != NP (and require no other assumptions to show that) which include the approximate forms of graph coloring and the set cover problem. Moreover, if one assumes slightly stronger assumptions than P != NP, one gets similar results for other NP problems. It is noteworthy that many of the approximation algorithms in question date back to the 1970s, which suggests that even when finding approximation algorithms one gets close to best possible very quickly.
Another class of problems to look at are problems which are just frequently used. Common examples are finding gcds, multiplying numbers, and the two closely related problems of multiplying matrices and finding determinants.
In the case of gcds, we have near optimal algorithms in strong senses (although it is open whether there’s some way of making gcd calculations parallel, most people suspect that there’s not much improvement possible here).
Multiplying numbers can be done more efficiently than naive multiplication (which is O((log n)^2) for 2 n digits numbers) by using a version of the fast Fourier transform. This may be an example where unexpected speed-ups really do exist in what may be long-standing basic processes. However, there are clear limits to how much this can be improved, since explicit bounds exist for minimum computation required by FFTs (and these bounds exist without explicit constants). While there are still open questions about how efficient one can make FFTs, even if one assumed the most generous possible marginally plausible results, FFTs would not become much more efficient.
In the case of finding determinants, we have good algorithms that are much more efficient than the naive algorithms. For an n x n it follows from not too much linear algebra that one can calculate determinants in O(n^3) arithmetic operations, and better algorithms exist that run around O(n^A) where A is a nasty constant slightly less than 2.4. There’s a clear limit here to how much this can be improved. One cannot go better than O(n^2) because an n x n matrix has n^2 entries. So this is close to the naive lower bound. Not much room for improvement there. The primary room for improvement is that this uses the Coppersmith–Winograd for matrix multiplication, which is the most efficient algorithm for matrix multiplication known, but it isn’t used in practice, because the constants involved make it less efficient than other algorithms in the practical range of matrices (I don’t know precisely where CW exceeds other algorithms I think it is around n = 10^9 or so.) so there’s room for improvement there, but not much room for improvement. And most people who have thought about these issues are confident that determinants cannot be calculated in O(n^2).
So the upshot is that while there’s still work to do, for a lot of these problems, it doesn’t look like there’s room for “massive software optimization”.
So, when you said: “complexity theoretic questions”, you were actually talking about the possible equivalence of different classes within the complexity hierarchy? That wasn’t obvious to me—I took what you wrote as referring to any old complexity theoretic questions. Such as what language to use when measuring K-complexity. The idea that K-complexity has something to do with compression ratios is what I was calling obvious. A communications breakdown.
Yes, with some apparent illusion of transparency too.
OK. FWIW, I don’t need examples of existing algorithms approaching theoretical limits.
As I said, the main problem where I think there is important scope for software improvement is induction. By my estimate, the brain spends about 80% of its resources on induction—so performance on induction seems important.
Current performance on the Hutter prize suggests that a perfect compressor could do about 25% better on the target file than the current champion program does.
So, perhaps it depends on how you measure it. If you think a mere 25% is not much improvement, you may not be too impressed. However, there are a few things to bear in mind:
Induction progress works in a similar manner to the rating scale in go: the higher you climb, the more difficult it is to make further progress.
There’s another similarity to go’s rating scale. In go, God is estimated to be 11-dan—not all that much better than a 9-dan human champion. However, this apparent close proximity to perfection is kind-of an illusion. Play go on bigger boards, and larger margins between humans and God are likely to become apparent. Similarly, measure induction using a more challenging corpus, and today’s programs will not appear to be so close to optimal.
The other thing to bear in mind is that intelligent agents are a major part of the environment of other intelligent agents. This means that it is not very realistic to model a fixed set of environmental problems (TSPs, etc), and to measure intelligence with respect to them. Rather there is an intelligence arms race—with many of the problems which intelligent agents face being posed by other agents.
We can see a related effect in mathematics. Many mathematicians work on the unresolved problems at the boundary of what is known in their field. The more progress they make, the harder the unresolved problems become, and the more intelligence is required to deal with them.
We’ll know what the core technical problem of machine intelligence is once we achieve machine intelligence. Achieve it, and then I’ll believe your claim that you know what’s involved in achieving it.
IMO, we can clearly see what it is now.
In general, there is no need to perform an engineering feat before you can claim to have understood what problem it involves. We understood the basics of flight before we could ourselves fly. That is also true for machine intelligence today—we have a general theory of intelligence, and can see what the technical side of the problem of building it consists of.
Induction power is the equivalent of lift. Not the only thing you need, but the most central and fundamental element, once you already have a universal computer and storage.
We did not understand in the sense of having a correct theory of fluid dynamics. We understood in the sense of having a working model, a paper plane, which actually flew.
Today it is the reverse. We have a theory, but we have no convincing paper brain. We have no working model.
We have the reverse of what ancient China had. They had no theory but they had a paper plane. We have a theory but we have no paper brain.
But with paper planes, we actually performed the essential engineering feat, in paper. The plane flew. But we have no paper brain that thinks.
IMO, we do have the machine-intelligence equivalent of paper planes.
They come in two main forms: forecasters and expert systems.
What we need is more heavy lifting.