itaibn0 comments on Does Checkers have simpler rules than Go?

itaibn0 13 Aug 2013 17:46 UTC
15 points
Code golfing is basically this, without the compression. You can see some examples here. The result matches very badly with our intuition of good code. Even with language-specific compression, which would presumably allow comments and reasonable variable names without any loss, I would still the optimal code to be bad from a conventional point of view.
- Daniel_Burfoot 14 Aug 2013 21:23 UTC
  3 points
  Parent
  What bad coding habits do you think trying to optimize for compressibility would promote? I can think of about 10 good habits it would promote, the most important being Don’t Repeat Yourself.
  
  Obviously you need to avoid penalizing for good variable names and comments.
  - Lumifer 15 Aug 2013 2:53 UTC
    10 points
    Parent
    
    What bad coding habits do you think trying to optimize for compressibility would promote?
    
    Being easily understandable by humans.
    
    Knuth: “The best programs are written so that computing machines can perform them quickly and so that human beings can understand them clearly. A programmer is ideally an essayist who works with traditional aesthetic and literary forms as well as mathematical concepts, to communicate the way that an algorithm works and to convince a reader that the results will be correct.”
  - itaibn0 14 Aug 2013 23:48 UTC
    9 points
    Parent
    I’m thinking in particular of two bad features:
    
    Bad interfaces: Functions need only work on the inputs which are actually given to them in practice. They may behave unexpectedly for valid input, and they definitely won’t give useful error messages for invalid input. They may have ad-hoc alterations to their behavior because it is useful for the particular domain.
    
    Bad idioms: There is no disincentive to use unfamiliar features and unconventional patterns which will confusing anybody trying to read the code. The canonical way to do something is often not the most efficient one.
    
    You can see these features in code golfing.
    
    More abstractly, good code is defined by a broad range of criteria, and optimizing for only one feature is likely to do less well than intentionally setting out to write good code.
    
    On the other hand, STEPS is project designed to make a working personal computer ins as little lines of code as possible. According to their report, making shorter code also results in simpler design. This is evidence in favor of your conclusion.
    - Daniel_Burfoot 15 Aug 2013 3:59 UTC
      1 point
      Parent
      The point about good error messages seems right; I would deal with this by saying that the specification should require that the code give meaningful error messages when called on invalid input. I’m not sure what you mean by ad-hoc alterations; it seems to me that the compression principle discourages the use of ad-hoc alterations and that’s exactly the point.
      
      Here are some good cases where the compression principle would agree with standard good coding practices:
      
      Don’t Repeat Yourself
      Use the least public access modifiers possible (e.g. prefer protected to public and private to protected in Java)
      Avoid allowing lots of variables to hang around in scope; if you find yourself with more than 5 or 6 variables in a method, pull some of them out into another method.
      Avoid use of string or integer literals, especially if they are reused.
      Avoid “old school” for loops like (for(int i =0; i < 10; i++)); prefer for(int i : range(10))
      Prefer object variables to static variables and method-scope variables to object variables
      Use the weakest possible assumptions about method argument type: don’t define a method to take a TreeSet argument if it only needs to know that the argument is a Collection
      Import only the external libraries you need. Don’t import java.util.*; use import java.util.List and java.util.Map
      Reduce dependence on third party libraries
      Used mixed type method signatures. If a method takes a String, Integer, and FooBar arguments, and you happen to have a String, an Integer, and a FooBar in scope when you call the method, the compressor can infer the order in which the arguments are placed. If the method takes four String arguments, you are out of luck.
      - Viliam_Bur 18 Aug 2013 20:16 UTC
        3 points
        Parent
        
        Don’t import java.util.*; use import java.util.List and java.util.Map
        
        I am curious about the compression algorithm that would compress “import java.util.List; import java.util.Map” to a shorter string than “import java.util.*;”.
        
        Unless you would artificially make it compress the latter very inefficiently—but, of course, using that kind of strategy you could “prove” almost anything.
        Daniel_Burfoot 19 Aug 2013 15:30 UTC
        1 point
        Parent
        It’s not the import statements that get compressed, it’s the code beneath the imports that gets better compression because you limited the number of classes you imported. The compressor can compress references to class names to log(N) bits, where N is the number of class names that are in scope/imported.
        pengvado 19 Aug 2013 4:38 UTC
        1 point
        Parent
        I interpret Daniel_Burfoot’s idea as: “import java.util.*” makes subsequent mentions of List longer, since there are more symbols in scope that it has to be distinguished from.
        
        But I don’t think that idea actually works. You can decompose the probability of a conjunction into a product of conditional probabilities, and you get the same number regardless of the order of said decomposition. Whatever probability (and corresponding total compressed size) you assign to a certain sequence of imports and symbols, you could just as well record the symbols first and then the imports. In which case by the time you get to the imports you already know that the program didn’t invoke any utils other than List and Map, so even being able to represent the distinction between the two forms of import is counterproductive for compression.
        
        The intuition of “there are more symbols in scope that it has to be distinguished from” goes wrong because it fails to account for updating a probability distribution over what symbol comes next based on what symbols you’ve seen. Such an update can include knowledge of which symbols come in the same header, if that’s correlated with which symbols are likely to appear in the same program.
        Daniel_Burfoot 19 Aug 2013 15:40 UTC
        1 point
        Parent
        You’re totally right about the fact that the compressor could change the order of the raw text in the encoding format, and could infer the import list from the main code body, and encode class name references based on the number of previous occurrences of the class name in the code body. It’s not clear to me a priori that this will actually give better compression in practice, but it’s possible.
        
        But even if that’s true, the main point still holds: limiting the number of external classes you use allows the compressor to same bits.
      - Eugine_Nier 16 Aug 2013 5:17 UTC
        −1 points
        Parent
        Well, only three of your practices have anything to do with compression.
        itaibn0 16 Aug 2013 14:46 UTC
        7 points
        Parent
        I believe his imagined compression scheme compresses variables so that when there are N variables in scope, it takes log(N) bits to refer to a variable, and similarly for methods.
        Daniel_Burfoot 16 Aug 2013 22:34 UTC
        3 points
        Parent
        If the source code compressor is well-written (and specialized to the language in question), then the programmer can achieve compression by using the practices mentioned above. itaibn0 has the right idea.
  - ShardPhoenix 15 Aug 2013 4:55 UTC
    7 points
    Parent
    Code golfing encourages factoring out coincidental (as opposed to fundamental) redundancy, which is bad for understandability/maintainability. For example, there are C golf programs that read (coincidentally useful) characters from their own source to avoid having to include string literals. Basically, golfed code tends to have fragile interdependence between parts (high coupling), which makes it hard to extend.