If you generalize this from naming to interfaces, I think it’s one of the most important aspects of
how to code well. Thank you for sticking such a clear metaphor to it! Here’s my thinking:
Useful programs are often large (say >100,000 LOC), and large programs are spectacularly complex.
The majority of those lines are essential, and if you changed one of them, the program would break
in a small or big way. No one can keep all of this in their head. Now add in a dozen or more
programmers, all of who modify this code base daily, while trying to add features and fix bugs. This
framing should make it obvious that managing complexity is one of the primary tasks of a programmer,
for anyone who didn’t already have that perspective.
Or in the words of Bill Gates, “Measuring programming progress by lines of code is like measuring
aircraft building progress by weight.” (The reason more lines is bad isn’t on the computers’ side: computers can handle millions of lines just fine. The reason is on the humans’ side: it’s the complexity they bring.)
I really only know one major approach to managing complexity: you split the big complicated thing
into smaller pieces, recursively, and make it possible to understand each piece without
understanding its implementation. So that you don’t have to open the box.
In this post you talk about naming functions. If a function is a box, then a good name on the box
lets you use the box without opening it. But there’s more on the box than the function’s name, and
you should make use of all of it, for exactly the reasoning in this post!
Sometimes you can’t fit all the salient information about what a function does in a short name;
the rest should go in its doc string.
In a typed language, a function’s type signature also serves as documentation. It tells you
exactly what kinds of things it expects as argument, and exactly what it produces, and, depending
on the language, what kinds of errors it might throw. The best part of this “type
documentation” is that it can never get out of date, because the type checker validates it!
There’s a principle called “make illegal states unrepresentable”, which means that you arrange
your data types such that you cannot construct invalid data; this helps here by making the type signature convey more information.
Functions/methods are the smallest pieces, and their boundary is their (i) name, (ii) doc string,
and (iii) type signature. What the larger pieces are depends on the language and program, but I
clump them all as “modules” in my head: interfaces, classes, modules, packages, APIs, etc.. The
common shape tends to be a set of named functions.
The primary way I organize my code, is to split it into “modules” (generally construed), such that
each module “does one thing and does it well”. How can you tell if it “does one thing”? Write the
module’s docs, which should include a high-level overview of the whole module, plus shorter docs for
each function in the module. The rule is that your docs have to fully describe how to use the
module and what its behavior will be under any use case. This tends to make it really obvious when
things are poorly organized. I’ve often realized that it will literally be less work to re-organize
the code than to properly document it as is, because of all the horrible edge cases I would have to
talk about.
On the other hand, I find that many other people don’t even want to invest a few seconds in
[brainstorming for a good name for something].
I’m sorry you don’t have a good naming buddy! Everyone should have a naming buddy; it’s so hard to
come up with good names on your own.
Thanks for this! It’s helpful to hear things framed from a different person’s perspective. In particular, the way you explained “complex systems have to be broken into parts, and parts have to be understandable without opening the box”.
But there’s more on the box than the function’s name, and you should make use of all of it, for exactly the reasoning in this post!
Great point! I have to admit, I didn’t know that docstrings existed until now. Kinda funny that I wrote this post without knowing what docstrings are. I’m really excited to use them in my next project now.
and their boundary is their (i) name, (ii) doc string, and (iii) type signature.
Actually, one of my crazy ideas is to extend this boundary even further with visuals. (Well, in that post I wasn’t necessarily talking about it as part of the “hover over a line of code in a text editor interface”, but it could fit there.)
How can you tell if it “does one thing”? Write the module’s docs, which should include a high-level overview of the whole module, plus shorter docs for each function in the module.
Ah that makes sense. Sounds like a good forcing function.
I’m sorry you don’t have a good naming buddy! Everyone should have a naming buddy; it’s so hard to come up with good names on your own.
Yeah. In a perfect world I’d actually do something along the lines of low-fi usability testing with people. But instead of testing whether they understand a UI, testing whether they understand my code.
If you generalize this from naming to interfaces, I think it’s one of the most important aspects of how to code well. Thank you for sticking such a clear metaphor to it! Here’s my thinking:
Useful programs are often large (say >100,000 LOC), and large programs are spectacularly complex. The majority of those lines are essential, and if you changed one of them, the program would break in a small or big way. No one can keep all of this in their head. Now add in a dozen or more programmers, all of who modify this code base daily, while trying to add features and fix bugs. This framing should make it obvious that managing complexity is one of the primary tasks of a programmer, for anyone who didn’t already have that perspective.
Or in the words of Bill Gates, “Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” (The reason more lines is bad isn’t on the computers’ side: computers can handle millions of lines just fine. The reason is on the humans’ side: it’s the complexity they bring.)
I really only know one major approach to managing complexity: you split the big complicated thing into smaller pieces, recursively, and make it possible to understand each piece without understanding its implementation. So that you don’t have to open the box.
In this post you talk about naming functions. If a function is a box, then a good name on the box lets you use the box without opening it. But there’s more on the box than the function’s name, and you should make use of all of it, for exactly the reasoning in this post!
Sometimes you can’t fit all the salient information about what a function does in a short name; the rest should go in its doc string.
In a typed language, a function’s type signature also serves as documentation. It tells you exactly what kinds of things it expects as argument, and exactly what it produces, and, depending on the language, what kinds of errors it might throw. The best part of this “type documentation” is that it can never get out of date, because the type checker validates it! There’s a principle called “make illegal states unrepresentable”, which means that you arrange
your data types such that you cannot construct invalid data; this helps here by making the type signature convey more information.
Functions/methods are the smallest pieces, and their boundary is their (i) name, (ii) doc string,
and (iii) type signature. What the larger pieces are depends on the language and program, but I clump them all as “modules” in my head: interfaces, classes, modules, packages, APIs, etc.. The common shape tends to be a set of named functions.
The primary way I organize my code, is to split it into “modules” (generally construed), such that
each module “does one thing and does it well”. How can you tell if it “does one thing”? Write the
module’s docs, which should include a high-level overview of the whole module, plus shorter docs for each function in the module. The rule is that your docs have to fully describe how to use the
module and what its behavior will be under any use case. This tends to make it really obvious when things are poorly organized. I’ve often realized that it will literally be less work to re-organize the code than to properly document it as is, because of all the horrible edge cases I would have to talk about.
I’m sorry you don’t have a good naming buddy! Everyone should have a naming buddy; it’s so hard to come up with good names on your own.
Thanks for this! It’s helpful to hear things framed from a different person’s perspective. In particular, the way you explained “complex systems have to be broken into parts, and parts have to be understandable without opening the box”.
Great point! I have to admit, I didn’t know that docstrings existed until now. Kinda funny that I wrote this post without knowing what docstrings are. I’m really excited to use them in my next project now.
Actually, one of my crazy ideas is to extend this boundary even further with visuals. (Well, in that post I wasn’t necessarily talking about it as part of the “hover over a line of code in a text editor interface”, but it could fit there.)
Ah that makes sense. Sounds like a good forcing function.
Yeah. In a perfect world I’d actually do something along the lines of low-fi usability testing with people. But instead of testing whether they understand a UI, testing whether they understand my code.