A thing I did recently while working on my compiler was write a “macroexpand” function that wasn’t recursive. It macroexpanded once, but didn’t macroexpand the result. I think I was like “whatever, I’ll come back later and handle that”, and maybe I would have done. But shortly afterwards I made the types more specific, using mildly advanced features of Haskell, and now the macroexpand function claimed to return a “thing that has been fully macroexpanded” but in code was returning a “thing containing something that hasn’t been macroexpanded”. I might have caught this easily anyway, but at minimum I think it would have been harder to track down.
So when the author says “I carry around invariants (which is a fancy name for properties about things in my program) in my head all the time. Only one of those invariants is its type”, my reaction is that he probably could put more invariants in the type. Examples of that: you can put “this can’t be null” into the type, or more precisely the thing can’t be null by default but you can choose to relax that invariant in the type. You can put “this function may only be called from within a transaction of isolation level no weaker than repeatable read” into a type, we’ve done that at work. You can put “this array has a known length” into a type, and “this function returns an array of the same length it was passed”, though I haven’t found those useful myself. Similarly “this hash table definitely has a value for key X” and “this function returns a hash table with exactly the same keys”. “This list is nonempty” is a type-level invariant I use a lot. You even could put “this is an increasing list of floats with mean X and standard deviation Y” into a type, modulo floating point comparison problems, though I’m not convinced that’s a realistic example.
I’ve also had a lot of cases where I have a type with a limited set of possible values, then I add a new possible value, and all the places that accept the type but don’t handle the value complain. This isn’t strictly type checking, it’s exhaustiveness checking, but I’m not sure you can get it without type checking. Maybe easy to catch through testing for functions that accept the type as input (as long as you have e.g. a random generator for the type, and add this case to it), but I think harder to catch when you call functions that return the type.
So I speculate static typing is more useful the more your domain types can be usefully modeled as tagged unions. If all your types are just “a field for this value, a field for that value, …”, where none of the values depend on the others, the kind of thing modeled by C structs or Python classes or Haskell records, it will mostly catch obvious errors. But if you find yourself writing things like that but with lots of invariants like ”foo is true, then bar will be null and baz won’t be”, then that’s a good fit for a tagged union, and you can move that invariant out of your head and into the type checker and exhaustiveness checker.
(There’s a related thing that we don’t have, I guess I’d call co-exhaustiveness checking. Consider: “I have a random generator for values of this type, if I add a new possibility for the type how do I make sure the generator stays in sync?” Or “how do I check the function parsing it from strings has a way to parse the new value?”)
(Also, if the author’s never had to spend a long time tracking down “this function should never be called with a null input but somehow it is, wtf is going on”… more power to him, I guess.)
Great points. I see how making your types more specific/complicated helps you catch bugs, and the example with your compiler really helped me to see that. However, making types more complex also has a larger upfront cost. It requires more thought from you. I don’t have a good intuition for which side of the tradeoff is stronger though.
More on type checking:
A thing I did recently while working on my compiler was write a “macroexpand” function that wasn’t recursive. It macroexpanded once, but didn’t macroexpand the result. I think I was like “whatever, I’ll come back later and handle that”, and maybe I would have done. But shortly afterwards I made the types more specific, using mildly advanced features of Haskell, and now the macroexpand function claimed to return a “thing that has been fully macroexpanded” but in code was returning a “thing containing something that hasn’t been macroexpanded”. I might have caught this easily anyway, but at minimum I think it would have been harder to track down.
So when the author says “I carry around invariants (which is a fancy name for properties about things in my program) in my head all the time. Only one of those invariants is its type”, my reaction is that he probably could put more invariants in the type. Examples of that: you can put “this can’t be null” into the type, or more precisely the thing can’t be null by default but you can choose to relax that invariant in the type. You can put “this function may only be called from within a transaction of isolation level no weaker than repeatable read” into a type, we’ve done that at work. You can put “this array has a known length” into a type, and “this function returns an array of the same length it was passed”, though I haven’t found those useful myself. Similarly “this hash table definitely has a value for key X” and “this function returns a hash table with exactly the same keys”. “This list is nonempty” is a type-level invariant I use a lot. You even could put “this is an increasing list of floats with mean X and standard deviation Y” into a type, modulo floating point comparison problems, though I’m not convinced that’s a realistic example.
I’ve also had a lot of cases where I have a type with a limited set of possible values, then I add a new possible value, and all the places that accept the type but don’t handle the value complain. This isn’t strictly type checking, it’s exhaustiveness checking, but I’m not sure you can get it without type checking. Maybe easy to catch through testing for functions that accept the type as input (as long as you have e.g. a random generator for the type, and add this case to it), but I think harder to catch when you call functions that return the type.
So I speculate static typing is more useful the more your domain types can be usefully modeled as tagged unions. If all your types are just “a field for this value, a field for that value, …”, where none of the values depend on the others, the kind of thing modeled by C structs or Python classes or Haskell records, it will mostly catch obvious errors. But if you find yourself writing things like that but with lots of invariants like ”
foo
is true, thenbar
will be null andbaz
won’t be”, then that’s a good fit for a tagged union, and you can move that invariant out of your head and into the type checker and exhaustiveness checker.(There’s a related thing that we don’t have, I guess I’d call co-exhaustiveness checking. Consider: “I have a random generator for values of this type, if I add a new possibility for the type how do I make sure the generator stays in sync?” Or “how do I check the function parsing it from strings has a way to parse the new value?”)
(Also, if the author’s never had to spend a long time tracking down “this function should never be called with a null input but somehow it is, wtf is going on”… more power to him, I guess.)
Great points. I see how making your types more specific/complicated helps you catch bugs, and the example with your compiler really helped me to see that. However, making types more complex also has a larger upfront cost. It requires more thought from you. I don’t have a good intuition for which side of the tradeoff is stronger though.