Thomas comments on Open thread, September 16-22, 2013

Thomas 16 Sep 2013 12:35 UTC
7 points
I have sorted 50 US states on such a way, that their Levenshtein string difference is minimal:

Massachusetts, Mississippi, Missouri, Wisconsin, Washington, Michigan, Maryland, Pennsylvania, Rhode Island, Louisiana, Indiana, Montana, Kentucky, Connecticut, Minnesota, Tennessee, New Jersey, New Mexico, New Hampshire, New York, Delaware, Hawaii, Iowa, Utah, Idaho, Ohio, Maine, Wyoming, Vermont, Oregon, Arizona, Arkansas, Kansas, Texas, Nevada, Nebraska, Alaska, Alabama, Oklahoma, Illinois, California, Colorado, Florida, Georgia, Virginia, West Virginia, South Carolina, North Carolina, North Dakota, South Dakota

http://protokol2020.wordpress.com/2013/09/13/order-by-string-proximity/

I don’t know, was it done before?
- RolfAndreassen 16 Sep 2013 17:52 UTC
  5 points
  Parent
  Did you order this by a greedy-local algorithm that always takes the next state minimising the difference with the current one; or by a global minimisation of the total difference of all pairs? Presumably the latter is unique but the former changes the order depending on the starting state.
  - Douglas_Knight 16 Sep 2013 19:22 UTC
    7 points
    Parent
    This is a traveling salesman problem, so it is unlikely that Thomas used an algorithm that guarantees optimality. If I understand your proposed greedy algorithm correctly, the distances at the beginning would be shorter than the distances at the end, which I do not observe in his list. A greedy heuristic that would not produce that effect would be to consider the state to be a bunch of lists and at every step concatenate the two lists whose endpoints are closest. This is a metric TSP, so the Christofides algorithm is no more than 1.5x optimal.
    - Thomas 17 Sep 2013 10:35 UTC
      0 points
      Parent
      You are right. I’ll call this sorting order Levenshtein-TSP ordering.
  - Thomas 16 Sep 2013 19:40 UTC
    2 points
    Parent
    It is a global minimization. It takes 261 insert/delete operations from Massachusetts to South Dakota.
    
    I got many different solutions with 261 insert/delete operations. Some 262 and more, but none 260 or less.
    
    It’s a challenge to everybody interested to do better. I am not sure if it’s possible.
    - RolfAndreassen 17 Sep 2013 3:30 UTC
      0 points
      Parent
      Not clear what the number of operations has to do with it; isn’t the challenge to find a smaller total Levenshtein difference?
      
      Incidentally, does it make a difference if you consider the end of the string to wrap around to the beginning?
      - Thomas 17 Sep 2013 5:29 UTC
        4 points
        Parent
        The Levenshtein difference IS the number of insert/delete operations necessary, to transform a string A to string B.
        
        Wrapping around, a circular list, is another option, yes.
        RolfAndreassen 17 Sep 2013 14:16 UTC
        1 point
        Parent
        Ah! Well then, I learned something today, I can go to bed. :)