If an array of length n contains k distinct values, I can do this:
1) Gather the first occurrence of each value into a contiguous buffer.
2) Divide the array into subarrays of length k and merge sort them, using the buffer as swap space.
3) Merge the subarrays with inplace_merge.
The first two steps can always be done in O(n log n). The third step seems to be okay when either k or n/k are bounded by a constant, but I’m not sure about the general case. Thinking some more.
I spent a day on this problem and now I understand why block sort has to be so complex. Here’s two simpler problems that are still surprisingly hard:
1) Given an array [a1, a2, …, an, b1, b2, …, bn], rearrange it into [a1, b1, a2, b2, …, an, bn] using O(n) time and O(1) space.
2) Given an array containing only zeroes and ones, sort them stably in O(n log n) time and O(1) space.
Both are special cases of sorting, and I have beautiful solutions to both, but they are completely different!
If an array of length n contains k distinct values, I can do this:
1) Gather the first occurrence of each value into a contiguous buffer.
2) Divide the array into subarrays of length k and merge sort them, using the buffer as swap space.
3) Merge the subarrays with inplace_merge.
The first two steps can always be done in O(n log n). The third step seems to be okay when either k or n/k are bounded by a constant, but I’m not sure about the general case. Thinking some more.