I have usually seen that quotation in the modified form: “only two hard things: cache invalidation, naming things, and off-by-one errors”. (It appears that this modification was introduced by someone called Leon Bambrick.)
I like the modified version because (1) it’s funny and (2) off-by-one errors are indeed a common source of trouble (though, I think, in a rather different way from cache invalidation and naming things). I do wish Karlton had said “software development” rather than “computer science”, though.
At least one of us is confused. It never occurred to me that the original comment was intended as a joke (except in so far as it’s a deliberate drastic oversimplification) and I don’t think I understand what you mean about cacheing being subsumed by naming (especially as the alleged hard problem is not cacheing but cache invalidation—which seems to me to have very little to do with naming).
I’m probably missing something here; could you explain your interpretation of the original comment a bit more? (With of course the understanding that explaining jokes tends to ruin them.)
cache invalidation—which seems to me to have very little to do with naming
I don’t agree with Douglas_Knight’s claim about the intent of the quote, but a cache is a kind of (application of a) key-value data structure. Keys are names. What information is in the names affects how long the cache entries remain correct and useful for.
(Correct: the value is still the right answer for the key. Useful: the entry will not be unused in the future, i.e. is not garbage in the sense of garbage-collection.)
I agree that a cache can be thought of as involving names, but even if—as you suggest, and it’s a good point that I hadn’t considered in this context—you sometimes have some scope to choose how much information goes into the keys and hence make different tradeoffs between cache size, how long things are valid for, etc., it seems pretty strange to think of that as being about naming.
Well, as iceman mentioned on a different subthread, a content-addressable store (key = hash of value) is fairly clearly a sort of naming scheme. But the thing about the names in a content-addressable store is that unlike meaningful names, they say nothing about why this value is worth naming; only that someone has bothered to compute it in the past. Therefore a content-addressable store either grows without bound, or has a policy for deleting entries. In that way, it is like a cache.
For example, Git (the version control system) uses a content-addressable store, and has a policy that objects are kept only if they are referenced (transitively through other objects) by the human-managed arbitrary mutable namespace of “refs” (HEAD, branches, tags, reflog).
Tahoe-LAFS, a distributed filesystem which is partially content-addressable but in any case uses high-entropy names, requires that clients periodically “renew the lease” on files they are interested in keeping, which they do by recursive traversal from whatever roots the user chooses.
Why do you believe that the problem of naming doesn’t fall into computer science? Because people in that field find the question to low status to work on?
Nothing to do with status (did I actually say something that suggested a status link?), and my claim isn’t that computer science doesn’t have a problem with naming things (everything has a problem with naming things) but that when Karlton said “computer science” he probably meant “software development”.
[EDITED to remove a remark that was maybe unproductively cynical.]
The question isn’t whether computer science has a problem with naming things but whether naming information structures is a computer science problem.
It’s not a problem of algorithms but it’s a problem of how to relate with information. Given how central names are to human reasoning and human intelligence, caring about names seems to be relevant for building artificial intelligence.
When I read your post, my initial thought was of Kernighan and Pike’s The Practice of Programming. Fortunately, I had to spend some time looking it up because I’d forgotten the name of the book; when I did, I was somewhat disappointed.
The first chapter is on programming style, but very little of it is about naming things, as is relevant to your question. About half of that chapter is inaccurate or useless if you’re using a language other than C, which you probably are.
Nevertheless, if you have the opportunity to read that 28-page chapter, I recommend it.
The end of that chapter makes the following reading recommendations related to programming style:
Kernighan, Plauger The Elements of Programming Style
Maguire Writing Solid Code
McConnell Code Complete
van der Linden Expert C Programming: Deep C Secrets
Code Complete has a section on this. But we don’t have a precise understanding of what a “good name” is, for the same reason that we don’t have a precise understanding of what a “good song” is: the goodness of a name is measured by its effect on its reader.
So I think the high-level principle, if you want to do a good job naming things in your program, is to model your intended reader as precisely as you can. What do they know about the problem domain? What programming conventions are they familiar with? Why are they reading your program—what matters to them? These concerns will inform your formatting and commenting style as well.
When you draw these distinctions you will exclude some people. That’s normal. You shouldn’t feel badly about that, any more than Thomas Mann felt bad that Chinese speakers had to learn German before they could read Der Zauberberg. If your work is influential enough, someone will translate or annotate it. And unlike a novel, most programs are read only by a small circle anyway.
If you want concrete advice instead of philosophy, this c2 page includes some useful tips.
But we don’t have a precise understanding of what a “good name” is, for the same reason that we don’t have a precise understanding of what a “good song” is: the goodness of a name is measured by its effect on its reader.
I’m not sure whether I buy that argument. It would be quite possible to go out and study naming in the real world and study problems that arise and what goes well.
Yes, I agree. That’s why I like the analogy to composition: most of the songs you might write, if you were sampling at random from song-space, are terrible. So we don’t sample randomly: our search through song-space is guided by our own reactions and a great body of accumulated theory and lore. But despite that, the consensus on which songs are the best, and on how to write them, is very loose.
(Actually it’s worse, I think composition is somewhat anti-inductive, but that’s outside the scope of this thread)
My experience is that naming is similar. There are some concrete tricks you can learn—do read the C2 wiki if you don’t already—and there’s a little bit of theory, some of which I tried to share insofar as I understand it. But naming is communication, communication requires empathy, and empathy is a two-place word: you can’t have empathy in the abstract, you can only have empathy for someone.
It might help to see a concrete example of this tension. I don’t endorse everything in this essay. But it’s a long-form example of a man grappling with the problem I’ve tried to describe.
To speak to the second of naming things, I’m a big fan of content addressable everything. Addressing all content by hash_function() has major advantages. This may require another naming layer to give human recognizable names to hashes, but I think this still goes a long way towards making things better.
You might find Joe Armstrong’s The Mess We’re In interesting, and provides some simple strawman algorithms for deduplication, though they probably aren’t sophisticated enough to run in practice.
(My roomate walked in while I was watching that lecture when I had headphones on, and just saw the final conclusion slide:
We’ve made a mess
We need to reverse entropy
Quantum mechanics sets limits to the ultimate speed of computation
We need Math
Abolish names and places
Build the condenser
Make low-power computers—no net environmental damage
And just did that smile and nod thing. The above makes it sound like Armstrong is a crank, but it all makes sense in context, and I’ve deliberately copied just this last slide without any other context to try to get you to watch it. If you like theoretical computer science, I highly recommend watching the lecture.)
To speak to the second of naming things, I’m a big fan of content addressable everything. Addressing all content by hash_function() has major advantages. This may require another naming layer to give human recognizable names to hashes, but I think this still goes a long way towards making things better.
It also requires (different) attention to versioning. That is, if you have arbitrary names, you can change the referent of the name to a new version, but you can’t do that with a hash. You can’t use just-a-hash in any case where you might want to upgrade/substitute the part but not the whole.
Conversely, er, contrapositively, if you need referents to not change ever, hashes are great.
Without focusing on the first problem, is there a path to get better at naming things?
Do you have experiences to share where you think you improved on the skill?
Exercises to recommend?
Books and articles to recommend?
I have usually seen that quotation in the modified form: “only two hard things: cache invalidation, naming things, and off-by-one errors”. (It appears that this modification was introduced by someone called Leon Bambrick.)
I like the modified version because (1) it’s funny and (2) off-by-one errors are indeed a common source of trouble (though, I think, in a rather different way from cache invalidation and naming things). I do wish Karlton had said “software development” rather than “computer science”, though.
But that joke distracts from the original joke that caching is subsumed by “naming things.”
At least one of us is confused. It never occurred to me that the original comment was intended as a joke (except in so far as it’s a deliberate drastic oversimplification) and I don’t think I understand what you mean about cacheing being subsumed by naming (especially as the alleged hard problem is not cacheing but cache invalidation—which seems to me to have very little to do with naming).
I’m probably missing something here; could you explain your interpretation of the original comment a bit more? (With of course the understanding that explaining jokes tends to ruin them.)
I don’t agree with Douglas_Knight’s claim about the intent of the quote, but a cache is a kind of (application of a) key-value data structure. Keys are names. What information is in the names affects how long the cache entries remain correct and useful for.
(Correct: the value is still the right answer for the key. Useful: the entry will not be unused in the future, i.e. is not garbage in the sense of garbage-collection.)
I agree that a cache can be thought of as involving names, but even if—as you suggest, and it’s a good point that I hadn’t considered in this context—you sometimes have some scope to choose how much information goes into the keys and hence make different tradeoffs between cache size, how long things are valid for, etc., it seems pretty strange to think of that as being about naming.
Well, as iceman mentioned on a different subthread, a content-addressable store (key = hash of value) is fairly clearly a sort of naming scheme. But the thing about the names in a content-addressable store is that unlike meaningful names, they say nothing about why this value is worth naming; only that someone has bothered to compute it in the past. Therefore a content-addressable store either grows without bound, or has a policy for deleting entries. In that way, it is like a cache.
For example, Git (the version control system) uses a content-addressable store, and has a policy that objects are kept only if they are referenced (transitively through other objects) by the human-managed arbitrary mutable namespace of “refs” (HEAD, branches, tags, reflog).
Tahoe-LAFS, a distributed filesystem which is partially content-addressable but in any case uses high-entropy names, requires that clients periodically “renew the lease” on files they are interested in keeping, which they do by recursive traversal from whatever roots the user chooses.
Why do you believe that the problem of naming doesn’t fall into computer science? Because people in that field find the question to low status to work on?
Nothing to do with status (did I actually say something that suggested a status link?), and my claim isn’t that computer science doesn’t have a problem with naming things (everything has a problem with naming things) but that when Karlton said “computer science” he probably meant “software development”.
[EDITED to remove a remark that was maybe unproductively cynical.]
The question isn’t whether computer science has a problem with naming things but whether naming information structures is a computer science problem.
It’s not a problem of algorithms but it’s a problem of how to relate with information. Given how central names are to human reasoning and human intelligence, caring about names seems to be relevant for building artificial intelligence.
When I read your post, my initial thought was of Kernighan and Pike’s The Practice of Programming. Fortunately, I had to spend some time looking it up because I’d forgotten the name of the book; when I did, I was somewhat disappointed.
The first chapter is on programming style, but very little of it is about naming things, as is relevant to your question. About half of that chapter is inaccurate or useless if you’re using a language other than C, which you probably are.
Nevertheless, if you have the opportunity to read that 28-page chapter, I recommend it.
The end of that chapter makes the following reading recommendations related to programming style:
Kernighan, Plauger The Elements of Programming Style
Maguire Writing Solid Code
McConnell Code Complete
van der Linden Expert C Programming: Deep C Secrets
...and Strunk, White The Elements of Style
Code Complete has a section on this. But we don’t have a precise understanding of what a “good name” is, for the same reason that we don’t have a precise understanding of what a “good song” is: the goodness of a name is measured by its effect on its reader.
So I think the high-level principle, if you want to do a good job naming things in your program, is to model your intended reader as precisely as you can. What do they know about the problem domain? What programming conventions are they familiar with? Why are they reading your program—what matters to them? These concerns will inform your formatting and commenting style as well.
When you draw these distinctions you will exclude some people. That’s normal. You shouldn’t feel badly about that, any more than Thomas Mann felt bad that Chinese speakers had to learn German before they could read Der Zauberberg. If your work is influential enough, someone will translate or annotate it. And unlike a novel, most programs are read only by a small circle anyway.
If you want concrete advice instead of philosophy, this c2 page includes some useful tips.
I’m not sure whether I buy that argument. It would be quite possible to go out and study naming in the real world and study problems that arise and what goes well.
Yes, I agree. That’s why I like the analogy to composition: most of the songs you might write, if you were sampling at random from song-space, are terrible. So we don’t sample randomly: our search through song-space is guided by our own reactions and a great body of accumulated theory and lore. But despite that, the consensus on which songs are the best, and on how to write them, is very loose.
(Actually it’s worse, I think composition is somewhat anti-inductive, but that’s outside the scope of this thread)
My experience is that naming is similar. There are some concrete tricks you can learn—do read the C2 wiki if you don’t already—and there’s a little bit of theory, some of which I tried to share insofar as I understand it. But naming is communication, communication requires empathy, and empathy is a two-place word: you can’t have empathy in the abstract, you can only have empathy for someone.
It might help to see a concrete example of this tension. I don’t endorse everything in this essay. But it’s a long-form example of a man grappling with the problem I’ve tried to describe.
To speak to the second of naming things, I’m a big fan of content addressable everything. Addressing all content by hash_function() has major advantages. This may require another naming layer to give human recognizable names to hashes, but I think this still goes a long way towards making things better.
You might find Joe Armstrong’s The Mess We’re In interesting, and provides some simple strawman algorithms for deduplication, though they probably aren’t sophisticated enough to run in practice.
(My roomate walked in while I was watching that lecture when I had headphones on, and just saw the final conclusion slide:
We’ve made a mess
We need to reverse entropy
Quantum mechanics sets limits to the ultimate speed of computation
We need Math
Abolish names and places
Build the condenser
Make low-power computers—no net environmental damage
And just did that smile and nod thing. The above makes it sound like Armstrong is a crank, but it all makes sense in context, and I’ve deliberately copied just this last slide without any other context to try to get you to watch it. If you like theoretical computer science, I highly recommend watching the lecture.)
It also requires (different) attention to versioning. That is, if you have arbitrary names, you can change the referent of the name to a new version, but you can’t do that with a hash. You can’t use just-a-hash in any case where you might want to upgrade/substitute the part but not the whole.
Conversely, er, contrapositively, if you need referents to not change ever, hashes are great.