Here’s another problem that might be easier. Make an O(n log n) sorting algorithm that’s simple, stable, and in place. Today you can only get two out of three (merge sort isn’t in place, heap sort isn’t stable, and block sort isn’t simple).
I’ve read some papers (Trabb Pardo, Huang-Langston 1, Huang-Langston 2, Munro-Raman-Salowe, Katajainen-Pasanen 1, Katajainen-Pasanen 2) and there seems to be a “wall” at sqrt(n) extra space. If we have that much space, we can write a reasonable-looking mergesort or quicksort that’s stable, in place and O(n log n). But if we have strictly O(1) space, the algorithms become much more complicated, using a sqrt(n) buffer inside the array to encode information about the rest. Breaking through that wall would be very interesting to me.
This claims to be a stable in-place sort and doesn’t seem outrageously complicated. I haven’t verified that it is stable, in-place, or in fact a sort at all.
It’s O(n log^2 n) because it merges subarrays using something like STL’s inplace_merge which is O(n log n). Devising an O(n) in-place merge, and thus an O(n log n) merge sort, is much harder. GrailSort and WikiSort have working implementations, both are over 500 lines.
If an array of length n contains k distinct values, I can do this:
1) Gather the first occurrence of each value into a contiguous buffer.
2) Divide the array into subarrays of length k and merge sort them, using the buffer as swap space.
3) Merge the subarrays with inplace_merge.
The first two steps can always be done in O(n log n). The third step seems to be okay when either k or n/k are bounded by a constant, but I’m not sure about the general case. Thinking some more.
This is an EXCELLENT problem. The industry needs this kind of algorithm very much. Three quarters of the industry is too stupid to use it, but one quarter would be delighted.
Unfortunately, the most probable solution of this problem is the proof that this kind of sorting algorithm doesn’t exist.
A simpler algorithm runs faster. Since these things occasionally get used for practical things that might matter. Then some people have an aesthetic preference for simpler models, maybe related to having learned Occam’s razor too early.
Maybe it was too hard.
Here’s another problem that might be easier. Make an O(n log n) sorting algorithm that’s simple, stable, and in place. Today you can only get two out of three (merge sort isn’t in place, heap sort isn’t stable, and block sort isn’t simple).
I’ve read some papers (Trabb Pardo, Huang-Langston 1, Huang-Langston 2, Munro-Raman-Salowe, Katajainen-Pasanen 1, Katajainen-Pasanen 2) and there seems to be a “wall” at sqrt(n) extra space. If we have that much space, we can write a reasonable-looking mergesort or quicksort that’s stable, in place and O(n log n). But if we have strictly O(1) space, the algorithms become much more complicated, using a sqrt(n) buffer inside the array to encode information about the rest. Breaking through that wall would be very interesting to me.
This claims to be a stable in-place sort and doesn’t seem outrageously complicated. I haven’t verified that it is stable, in-place, or in fact a sort at all.
It’s O(n log^2 n) because it merges subarrays using something like STL’s inplace_merge which is O(n log n). Devising an O(n) in-place merge, and thus an O(n log n) merge sort, is much harder. GrailSort and WikiSort have working implementations, both are over 500 lines.
Ah, good catch.
I spent a day on this problem and now I understand why block sort has to be so complex. Here’s two simpler problems that are still surprisingly hard:
1) Given an array [a1, a2, …, an, b1, b2, …, bn], rearrange it into [a1, b1, a2, b2, …, an, bn] using O(n) time and O(1) space.
2) Given an array containing only zeroes and ones, sort them stably in O(n log n) time and O(1) space.
Both are special cases of sorting, and I have beautiful solutions to both, but they are completely different!
If an array of length n contains k distinct values, I can do this:
1) Gather the first occurrence of each value into a contiguous buffer.
2) Divide the array into subarrays of length k and merge sort them, using the buffer as swap space.
3) Merge the subarrays with inplace_merge.
The first two steps can always be done in O(n log n). The third step seems to be okay when either k or n/k are bounded by a constant, but I’m not sure about the general case. Thinking some more.
This is an EXCELLENT problem. The industry needs this kind of algorithm very much. Three quarters of the industry is too stupid to use it, but one quarter would be delighted.
Unfortunately, the most probable solution of this problem is the proof that this kind of sorting algorithm doesn’t exist.
What do you mean by “in place”? Can’t you make the merges be in place, at the cost of more writes?
What does “simple” mean here?
Just use any definition that feels reasonable to you. If you have two solutions that are simple under different definitions, I want to see both!
I’m not seeing how it’s an issue if an algorithm isn’t simple, so I’m interested in why you consider a certain simplicity to be desireable.
A simpler algorithm runs faster. Since these things occasionally get used for practical things that might matter. Then some people have an aesthetic preference for simpler models, maybe related to having learned Occam’s razor too early.
No, not necessarily.
This is going nowhere, I came to buy chocolate, not argue its merits over vanilla. If you haven’t got chocolate, let’s wrap this up.