Did you order this by a greedy-local algorithm that always takes the next state minimising the difference with the current one; or by a global minimisation of the total difference of all pairs? Presumably the latter is unique but the former changes the order depending on the starting state.
This is a traveling salesman problem, so it is unlikely that Thomas used an algorithm that guarantees optimality. If I understand your proposed greedy algorithm correctly, the distances at the beginning would be shorter than the distances at the end, which I do not observe in his list. A greedy heuristic that would not produce that effect would be to consider the state to be a bunch of lists and at every step concatenate the two lists whose endpoints are closest. This is a metric TSP, so the Christofides algorithm is no more than 1.5x optimal.
I have sorted 50 US states on such a way, that their Levenshtein string difference is minimal:
Massachusetts, Mississippi, Missouri, Wisconsin, Washington, Michigan, Maryland, Pennsylvania, Rhode Island, Louisiana, Indiana, Montana, Kentucky, Connecticut, Minnesota, Tennessee, New Jersey, New Mexico, New Hampshire, New York, Delaware, Hawaii, Iowa, Utah, Idaho, Ohio, Maine, Wyoming, Vermont, Oregon, Arizona, Arkansas, Kansas, Texas, Nevada, Nebraska, Alaska, Alabama, Oklahoma, Illinois, California, Colorado, Florida, Georgia, Virginia, West Virginia, South Carolina, North Carolina, North Dakota, South Dakota
http://protokol2020.wordpress.com/2013/09/13/order-by-string-proximity/
I don’t know, was it done before?
Did you order this by a greedy-local algorithm that always takes the next state minimising the difference with the current one; or by a global minimisation of the total difference of all pairs? Presumably the latter is unique but the former changes the order depending on the starting state.
This is a traveling salesman problem, so it is unlikely that Thomas used an algorithm that guarantees optimality. If I understand your proposed greedy algorithm correctly, the distances at the beginning would be shorter than the distances at the end, which I do not observe in his list. A greedy heuristic that would not produce that effect would be to consider the state to be a bunch of lists and at every step concatenate the two lists whose endpoints are closest. This is a metric TSP, so the Christofides algorithm is no more than 1.5x optimal.
You are right. I’ll call this sorting order Levenshtein-TSP ordering.
It is a global minimization. It takes 261 insert/delete operations from Massachusetts to South Dakota.
I got many different solutions with 261 insert/delete operations. Some 262 and more, but none 260 or less.
It’s a challenge to everybody interested to do better. I am not sure if it’s possible.
Not clear what the number of operations has to do with it; isn’t the challenge to find a smaller total Levenshtein difference?
Incidentally, does it make a difference if you consider the end of the string to wrap around to the beginning?
The Levenshtein difference IS the number of insert/delete operations necessary, to transform a string A to string B.
Wrapping around, a circular list, is another option, yes.
Ah! Well then, I learned something today, I can go to bed. :)