That’s for linking the post! I quite liked it, and I agree that computational complexity doesn’t pose a challenge to general intelligence. I do want to dispute your notion that “if you hear that a problem is in a certain complexity class, that is approximately zero evidence of any conclusion drawn from it”. The world is filled with evidence, and it’s unlikely that closely related concepts give approximately zero evidence for each other unless they are uncorrelated or there are adversarial processes present. Hearing that list-sorting is O(n*log(n)) is pretty strong evidence that it’s easy to do, and hearing that simulating quantum mechanics is not in P is pretty strong evidence that it’s hard to do. Sure, there are lots of exceptions, but computational complexity is in fact a decent heuristic, especially if you go with average-case complexity of an approximation, rather than worst-case complexity of an exact answer.
Obviously, if computational complexity really forbid general intelligence...then we wouldn’t have it.
Technically, list sorting is O(n) if done a certain way...we just do it the O(n log n) way because it uses slightly less memory and is simpler for us (which is kind of related to Gwern’s rant, and he does technically mention a specialized one, though he doesn’t realize what it means, and puts it in a section about giving up generality for speed). I actually invented an example of that type of algorithm when I was teaching myself c++ using sorting algorithms and wanted to beat the built-in algorithms. In practice, sorting algorithms are often multiple different algorithms of different complexity classes mashed together...that then end up still being O(n log n) with better constant factors. We already optimize the hell out of constant factors for important problems.
I agree with you rather than Gwern. In my experience, complexity class really is a good indicator for how much compute something takes. It’s not a perfect proxy, and can definitely be goodharted, but it is still effective. Often, the constant factors are closely related to the complexity class too. Even describing an ‘NP’ problem is vastly more difficult than a ‘P’ problem, because there is just so much more to it.
I would claim that difficulty describing the process is actually a good proxy for how large the constant factors are likely to be, since they are what you have to keep in mind even for problem size 0, where we aren’t even trying to solve the issue, just keep it in mind.
How can list sorting be O(n)? There are n! ways to sort a list, which means that it’s impossible to have a list sorting algorithm faster than O(log(n!)) = O(n*log(n)).
He probably means bucket sort or basically same concept radix sort. You might call that cheating, but if the range of values you sort is small, then you can think of it as linear.
Morpheus is incorrect, I am actually referring to a generalization of ‘counting sort’ that I independently reinvented (where you are no longer limited to small positive numbers). I had never heard of sub n log n sorts at that time either. It’s actually quite simple. In fact, it’s how humans prefer to sort things. If you look at the object, and just know where it goes, put it there. The downside is just that there has to be a known place to put it. The O(n log n) lower bound is only for comparison sorts.
On a computer, we can just make a place to put it. If each value is unique, or we’re sorting numbers, or each value that is the same does not need further sorting, we can implement this in constant time per object being sorted. Also linear space, with space being O(m + n) where n is the number of objects being sorted and m being the range of values they can hold. This only works well if m is reasonable in size, not being too much larger than n. For instance, you might be sorting integers between 0 and 100, requiring only a hundred and one extra spots in memory, and be sorting 1,000,000,000 of them. (This is conceptually related to the bucket sort mentioned by Morpheus, but they are distinct, and this is truly linear).
Worst case is usually that you are sorting things whose values are an unknown 32bit integer, in which case you would need about 16GB of memory just for the possible values. This is normally unjustifiable, but if you were sorting a trillion numbers (and had a large amount or RAM), it’s nothing. (And the fact that it is linear would be quite a godsend.) Another positive is that this sort has virtually no branching, and is this much easier on a per instruction basis too. (Branching kills performance).
This can also be integrated as a sub-routine (that requires very little memory) in an overall sorting routine similar to (but better than) quicksort, and then it is more similar to bucket sort, but kind of reversed.
I feel I should mention an important implementation detail. If you do not know the range of possible values, it is often a good idea to check, as most generative processes only use a subset. This is also a linear time process, that only takes two values in memory, but does involve some branching. It is better to just know and put that in, but the ability to do this makes it a much more practical algorithm.
That’s for linking the post! I quite liked it, and I agree that computational complexity doesn’t pose a challenge to general intelligence. I do want to dispute your notion that “if you hear that a problem is in a certain complexity class, that is approximately zero evidence of any conclusion drawn from it”. The world is filled with evidence, and it’s unlikely that closely related concepts give approximately zero evidence for each other unless they are uncorrelated or there are adversarial processes present. Hearing that list-sorting is O(n*log(n)) is pretty strong evidence that it’s easy to do, and hearing that simulating quantum mechanics is not in P is pretty strong evidence that it’s hard to do. Sure, there are lots of exceptions, but computational complexity is in fact a decent heuristic, especially if you go with average-case complexity of an approximation, rather than worst-case complexity of an exact answer.
Obviously, if computational complexity really forbid general intelligence...then we wouldn’t have it.
Technically, list sorting is O(n) if done a certain way...we just do it the O(n log n) way because it uses slightly less memory and is simpler for us (which is kind of related to Gwern’s rant, and he does technically mention a specialized one, though he doesn’t realize what it means, and puts it in a section about giving up generality for speed). I actually invented an example of that type of algorithm when I was teaching myself c++ using sorting algorithms and wanted to beat the built-in algorithms. In practice, sorting algorithms are often multiple different algorithms of different complexity classes mashed together...that then end up still being O(n log n) with better constant factors. We already optimize the hell out of constant factors for important problems.
I agree with you rather than Gwern. In my experience, complexity class really is a good indicator for how much compute something takes. It’s not a perfect proxy, and can definitely be goodharted, but it is still effective. Often, the constant factors are closely related to the complexity class too. Even describing an ‘NP’ problem is vastly more difficult than a ‘P’ problem, because there is just so much more to it.
I would claim that difficulty describing the process is actually a good proxy for how large the constant factors are likely to be, since they are what you have to keep in mind even for problem size 0, where we aren’t even trying to solve the issue, just keep it in mind.
How can list sorting be O(n)? There are n! ways to sort a list, which means that it’s impossible to have a list sorting algorithm faster than O(log(n!)) = O(n*log(n)).
He probably means bucket sort or basically same concept radix sort. You might call that cheating, but if the range of values you sort is small, then you can think of it as linear.
Yes, technically it solves only a subproblem of the general problem of sorting: the case where the number of possible values is bounded.
Morpheus is incorrect, I am actually referring to a generalization of ‘counting sort’ that I independently reinvented (where you are no longer limited to small positive numbers). I had never heard of sub n log n sorts at that time either. It’s actually quite simple. In fact, it’s how humans prefer to sort things. If you look at the object, and just know where it goes, put it there. The downside is just that there has to be a known place to put it. The O(n log n) lower bound is only for comparison sorts.
On a computer, we can just make a place to put it. If each value is unique, or we’re sorting numbers, or each value that is the same does not need further sorting, we can implement this in constant time per object being sorted. Also linear space, with space being O(m + n) where n is the number of objects being sorted and m being the range of values they can hold. This only works well if m is reasonable in size, not being too much larger than n. For instance, you might be sorting integers between 0 and 100, requiring only a hundred and one extra spots in memory, and be sorting 1,000,000,000 of them. (This is conceptually related to the bucket sort mentioned by Morpheus, but they are distinct, and this is truly linear).
Worst case is usually that you are sorting things whose values are an unknown 32bit integer, in which case you would need about 16GB of memory just for the possible values. This is normally unjustifiable, but if you were sorting a trillion numbers (and had a large amount or RAM), it’s nothing. (And the fact that it is linear would be quite a godsend.) Another positive is that this sort has virtually no branching, and is this much easier on a per instruction basis too. (Branching kills performance).
This can also be integrated as a sub-routine (that requires very little memory) in an overall sorting routine similar to (but better than) quicksort, and then it is more similar to bucket sort, but kind of reversed.
Interesting!
I feel I should mention an important implementation detail. If you do not know the range of possible values, it is often a good idea to check, as most generative processes only use a subset. This is also a linear time process, that only takes two values in memory, but does involve some branching. It is better to just know and put that in, but the ability to do this makes it a much more practical algorithm.