Basically an article about why knowing a problem is in a certain complexity class does not mean AGI can’t do that problem. It’s some evidence, but usually isn’t enough unless 2-3 or more assumptions are right..
The biggest reasons are these:
Giving up determinism and using randomized algorithms which are faster but may not return an answer or a correct answer (they typically can be run many times if correctness is important; after a few times, the probability of an error will be smaller than the probability that the computer hardware has suffered a glitch due to cosmic rays); randomization can be applied to algorithms and to data structures.
Giving up optimality and computing an approximation of the optimal answer (often very close to the optimal answer) Already mentioned is 3-SAT/TSP, for which there is a World Tour of 1,904,711-cities which has been solved with a tour within 0.0474% of the optimal tour by 2007, and planning problems soluble by translation into SAT form can have millions of clauses & variables, enabling such silly applications as drawing portraits & art or unscrambling images using the TSP; Naam gives chemistry as an example by noting that while the exact physics is totally intractable, approximations which are much more feasible are used. The last fraction of a percentage point of optimality can take truly enormous amounts of computation to squeeze out.
Basically, we can tolerate not getting the algorithm to work every time, since we can just run it again and we can reduce error rates to levels far below the cosmic ray error rate. Also, the perfect is not the enemy of the good, so we can accept a 99.999% approximation, and that’s enough to design new technologies.
Lastly, constant factors are an issue:
Are all implementations equally fast?
“For every polynomial-time algorithm you have, there is an exponential algorithm that I would rather run.” “Alan Perlis
Premise 3 ignores that complexity classes by design try to abstract away from the ‘constant factors’ which is the computation time determined not by input size but by the details of computer architectures, implementations, and available computing hardware. AIs and humans can be equally bound by asymptotic complexity, but still differ on performance.
But with carefully optimized code, proper use of the cache hierarchy, and specialized hardware (eg. GPUs, ASICs), it is possible to see performance gains of multiple orders of magnitude, which implies that one can increase the input size several times before hitting the scaling way that another agent might who paid less attention to constant factors. (Computational chemistry may be intractable, even with approximations, on classical hardware—but what about if one has a quantum computer with a few hundred qubits, enough that one can do quantum simulation?) The importance of constant factors is one of the major traps in practical use of complexity classes: a fancy algorithm with a superior complexity class may easily be defeated by a simpler algorithm with worse complexity but faster implementation.6 (One reason that programmers are exhorted to benchmark, benchmark, benchmark!) This doesn’t disprove the complexity class, which is about asymptotic scaling and will still kick in at some point, but if it’s possible to double or dectuple or more the input, this is enough of an increase that it’s hard to dismiss the difference between best and worst agents’ problem sizes as being only ‘slight’ Finally, increased resource use / brute force is always an option for a powerful agent. Particularly in his second post, Naam’s argument assumes fixed resources. This might be relevant to a few scenarios like an AI permanently confined to a single computer and unable to access more resources—but then, how intelligent could such an AI possibly be if it can’t get out? However, thanks to its intelligence, humanity now controls a large fraction of the biosphere’s energy and with a supercomputer, or tech giants like Google or Amazon who control millions of processor-cores, can compute things totally out of reach of other agents; no limits to the amount of computation that can be done on (or off) Earth have yet been reached. Increases in computing resources of thousands or millions of times, along with larger timescales, can overcome the asymptotic to achieve the next intelligence increase; if a human-level AI can ‘only’ accomplish a few dozen doublings, well…
You should really read Gwern’s full post, it has a lot of insights.
One final note on complexity: Protein Folding was predicted to never progress that much because it was NP-Hard for computers. Suffice it to say with AlphaFold 2 and Deepmind releasing 200 million proteins, the prediction is very wrong now.
EDIT: I no longer endorse the strong version of no evidence from complexity theory of my argument above, so I am removing parts of my post and defending a weaker claim than I originally did.
That’s for linking the post! I quite liked it, and I agree that computational complexity doesn’t pose a challenge to general intelligence. I do want to dispute your notion that “if you hear that a problem is in a certain complexity class, that is approximately zero evidence of any conclusion drawn from it”. The world is filled with evidence, and it’s unlikely that closely related concepts give approximately zero evidence for each other unless they are uncorrelated or there are adversarial processes present. Hearing that list-sorting is O(n*log(n)) is pretty strong evidence that it’s easy to do, and hearing that simulating quantum mechanics is not in P is pretty strong evidence that it’s hard to do. Sure, there are lots of exceptions, but computational complexity is in fact a decent heuristic, especially if you go with average-case complexity of an approximation, rather than worst-case complexity of an exact answer.
Obviously, if computational complexity really forbid general intelligence...then we wouldn’t have it.
Technically, list sorting is O(n) if done a certain way...we just do it the O(n log n) way because it uses slightly less memory and is simpler for us (which is kind of related to Gwern’s rant, and he does technically mention a specialized one, though he doesn’t realize what it means, and puts it in a section about giving up generality for speed). I actually invented an example of that type of algorithm when I was teaching myself c++ using sorting algorithms and wanted to beat the built-in algorithms. In practice, sorting algorithms are often multiple different algorithms of different complexity classes mashed together...that then end up still being O(n log n) with better constant factors. We already optimize the hell out of constant factors for important problems.
I agree with you rather than Gwern. In my experience, complexity class really is a good indicator for how much compute something takes. It’s not a perfect proxy, and can definitely be goodharted, but it is still effective. Often, the constant factors are closely related to the complexity class too. Even describing an ‘NP’ problem is vastly more difficult than a ‘P’ problem, because there is just so much more to it.
I would claim that difficulty describing the process is actually a good proxy for how large the constant factors are likely to be, since they are what you have to keep in mind even for problem size 0, where we aren’t even trying to solve the issue, just keep it in mind.
How can list sorting be O(n)? There are n! ways to sort a list, which means that it’s impossible to have a list sorting algorithm faster than O(log(n!)) = O(n*log(n)).
He probably means bucket sort or basically same concept radix sort. You might call that cheating, but if the range of values you sort is small, then you can think of it as linear.
Yes, technically it solves only a subproblem of the general problem of sorting: the case where the number of possible values is bounded.
Morpheus is incorrect, I am actually referring to a generalization of ‘counting sort’ that I independently reinvented (where you are no longer limited to small positive numbers). I had never heard of sub n log n sorts at that time either. It’s actually quite simple. In fact, it’s how humans prefer to sort things. If you look at the object, and just know where it goes, put it there. The downside is just that there has to be a known place to put it. The O(n log n) lower bound is only for comparison sorts.
On a computer, we can just make a place to put it. If each value is unique, or we’re sorting numbers, or each value that is the same does not need further sorting, we can implement this in constant time per object being sorted. Also linear space, with space being O(m + n) where n is the number of objects being sorted and m being the range of values they can hold. This only works well if m is reasonable in size, not being too much larger than n. For instance, you might be sorting integers between 0 and 100, requiring only a hundred and one extra spots in memory, and be sorting 1,000,000,000 of them. (This is conceptually related to the bucket sort mentioned by Morpheus, but they are distinct, and this is truly linear).
Worst case is usually that you are sorting things whose values are an unknown 32bit integer, in which case you would need about 16GB of memory just for the possible values. This is normally unjustifiable, but if you were sorting a trillion numbers (and had a large amount or RAM), it’s nothing. (And the fact that it is linear would be quite a godsend.) Another positive is that this sort has virtually no branching, and is this much easier on a per instruction basis too. (Branching kills performance).
This can also be integrated as a sub-routine (that requires very little memory) in an overall sorting routine similar to (but better than) quicksort, and then it is more similar to bucket sort, but kind of reversed.
Interesting!
I feel I should mention an important implementation detail. If you do not know the range of possible values, it is often a good idea to check, as most generative processes only use a subset. This is also a linear time process, that only takes two values in memory, but does involve some branching. It is better to just know and put that in, but the ability to do this makes it a much more practical algorithm.
Has anyone ever suggested that the algorithm for intelligence might just be a tractable solution to an NP-complete problem, and that we live in a P=NP (or approximate) universe? I don’t believe this, because it doesn’t map to anything I understand about AI or the human brain, but it’s an interesting idea and I’m sure someone has pursued it before.
That would be shocking in a very fun sort of way…after breaking all cryptography on the internet, what sort of practical applications would such a discovery have?
Well, in this universe, NP and AI are secretly the same problem, so we’d get all the benefits of AI (singularity?) plus all the benefits of solving NP-class problems at the same time. I can’t think of an application of NP-class problem solving that would catch notice compared to AI, though. We already solved protein folding. Reduced carbon emissions from men who happen to be in sales and do a lot of travel, perhaps.
Protein folding is nothing compared with a general solution to finding proofs up to any given length if they exist or confirming that they don’t, or optimal solutions to virtually every scientific problem that can be formulated in any suitably verifiable way.
If solving NP-complete problems in polynomial time is the hallmark of intelligence, then humans are as dumb as rocks and any actual AI will blow us out of the water.
If you want to be pedantic, randomization doesn’t buy you anything for your complexity classes, because probably “P=BPP”[1](BPP is the class of problems where you can give probabilistic yes and no answers), so if you can solve it with randomness, it is already in P (if P = BPP), thus already easy to begin with. I’d say complexity classes are pretty important for realizing when your algorithm just isn’t going to work, and you need a different approach. For example, no matter how theoretically satisfying you will not get to AGI, if you just use lots of compute and implement naive inference with Bayes Nets (https://en.wikipedia.org/wiki/Bayesian_network).
ScottAaronson2006: “Over the last decade and a half, mounting evidence has convinced almost all of us that in fact P=BPP. In the remaining ten minutes of this lecture, we certainly won’t be able to review this evidence in any depth. But let me quote one theorem, just to give you a flavor of it: ”
This example contradicts the point he’s trying to make here. Quantum computers are thought to be asymptomatically faster at some tasks (including quantum simulation) but have much, much worse constant factors for individual operations.