Namespace pollution and name collision are two great concepts in computer programming. They way they are handled in many academic environments seems quite naive to me.
Programs can get quite large and thus naming things well is surprisingly important. Many of my code reviews are primarily about coming up with good names for things. In a large codebase, every time symbolicGenerator() is mentioned, it refers to the same exact thing. If after one part of the codebase has been using symbolicGenerator for a reasonable set of functions, and later another part comes up, and it’s programmer realizes that symbolicGenerator is also the best name for that piece, they have to make a tough decision. Either they could refactor the codebase to change all previous mentions of symbolicGenerator to use an alternative name, or they have to come up with an alternative name. They can’t have it both ways.
Therefore, naming becomes a political process. Names touch many programmers who have different intuitions and preferences. A large refactor of naming in a section of the codebase that others use would often be taken quite hesitantly by that group.
This makes it all the more important that good names are used initially. As such, reviewers care a lot about the names being pretty good; hopefully they are generic enough so that their components could be expanded while the name remains meaningful; but specific enough to be useful for remembering. Names that get submitted via pull requests represent much of the human part of the interface/API; they’re harder to change later on, so obviously require extra work to get right the first time.
To be clear, a name collision is when two unrelated variables have the same name, and namespace pollution refers to when code is initially submitted in ways that are likely to create unnecessary conflicts later on.
Academia
My impression is that in much of academia, there are few formal processes for groups of experts to agree on the names for things. There are specific clusters with very highly thought out terminology, particularly around very large sets of related terminology; for instance, biological taxonomies, the metric system, and various aspects of medicine and biology.
By in many other parts, it seems like a free-for-all among the elite. My model of the process is something like,
“Someone coming up with a new theory will propose a name for it and put it in their paper. If the paper is accepted (which is typically done with details in mind unrelated to the name), and if others find that theory useful, then they will generally call it the same name as the one used in the proposal. In some cases a few researchers will come up with a few variations for the same idea, in which case one will be selected through the process of what future researchers decide to use, on an individual bases. Often ideas are named after those who came up with them to some capacity; this makes a lot of sense to other experts who worked in these areas, but it’s not at all obvious if this is optimal for other people.”
The result is that naming is something that happens almost accidentally, as the result of a processes which isn’t paying particular attention to making sure the names are right.
When there’s little or no naming processes, than actors are incentivized to chose bold names. They don’t have to pay the cost for any namespace pollution they create. Two names that come to mind recently have been “The Orthogonality Thesis” or “The Simulation Hypothesis*. These are two rather specific things with very generic names. Those come to mind because they are related to our field, but many academic topics seem similar. Information theory is mostly about encoding schemes, which are now not that important. Systems theory is typically about a subset of dynamical systems. But of course, it would be really awkward for anyone else with a more sensible “Systems theory” to use that name for the new thing.
I feel like AI has had some noticeable bad examples; It’s hard to look at all the existing naming and think that this was the result of a systematic and robust naming approach. The Table of Contents of AI A Modern Approach seems quite good to me; that seems very much the case of a few people refactoring things to come up with one high-level overview that is optimized for being such. But the individual parts are obviously messy. A* search, alpha-beta pruning, K-consistency, Gibbs sampling, Dempster-shafer theory, etc.
LessWrong
One of my issues with LessWrong is the naming system. There’s by now quite a bit of terminology to understand; the LessWrong wiki seems useful here. But there’s no strong process from what I understand. People suggest names in their posts, these either become popular or don’t. There’s rarely any refactoring.
I’m not sure if it’s good or bad, but I find the way species get named interesting.
The general rule is “first published name wins”, and this is true even if the first published name is “wrong” in some way, like implies a relationship that doesn’t exist, since that implication is not officially semantically meaningful. But there are ways to get around this, like if a name was based on a disproved phylogeny, in which case a new name can be taken up that fits the new phylogenic relationship. This means existing names get to stick, at least up until the time that they are proven so wrong that they must be replaced. Alas, there’s no official registry of these things, so it’s up to working researchers to do literature reviews and get the names right, and sometimes people get it wrong by accident and sometimes on purpose because they think an earlier naming is “invalid” for one reason or another and so only recognize a later naming. The result is pretty confusing and requires knowing a lot or doing a lot of research to realize that, for example, two species names might refer to the same species in different papers.
Thanks, I didn’t know. That matches what I expect from similar fields, though it is a bit disheartening. There’s an entire field of library science and taxonomy, but they seem rather isolated to specific things.
I’m skeptical of single definitions without disclaimers. I think it’s misleading (to some) that “Truth is the correspondence between and one’s beliefs about reality and reality. “[1]. Rather, it’s fair to say that this is one specific definition of truth that has been used in many cases; I’m sure that others, including others on LessWrong, have used it differently.
Most dictionaries have multiple definitions for words. This seems more like what we should aim for.
In fairness, when I searched for “Rationality”, the result states, “Rationality is something of fundamental importance to LessWrong that is defined in many ways”, which I of course agree with.
I’m skeptical of single definitions without disclaimers.
At the meta-level it isn’t clear what value other definitions might offer (in this case). (“Truth” seems like a basic concept that is understood prior to reading that article—it’s easier to imagine such an argument for other concepts without such wide understanding.)
Most dictionaries have multiple definitions for words. This seems more like what we should aim for.
Perhaps more definitions should be brought in (as necessary), with the same level of attention to detail -
I’m sure that others, including others on LessWrong, have used it differently.
when they are used (extensively). It’s possible that relevant posts have already been made, they just haven’t been integrated into the wiki. Is the wiki up to date as of 2012, but not after that?
“Someone coming up with a new theory will propose a name for it and put it in their paper. If the paper is accepted (which is typically done with details in mind unrelated to the name)[1],
Footnote not found. The refactoring sounds like a good idea, though the main difficulty would be propagating the new names.
One of my issues with LessWrong is the naming system. There’s by now quite a bit of terminology to understand; the LessWrong wiki seems useful here. But there’s no strong process from what I understand. People suggest names in their posts, these either become popular or don’t. There’s rarely any refactoring.
One of the issues with this in both an academic and LW context is that changing the name of something in a single source of truth codebase is much cheaper than changing the name of something in a community. The more popular an idea, the more cost goes up to change the name. Similarly, when you’re working with a single organization, creating a process that everyone follows is relatively cheap compared to a loosely tied together community with various blogs, individuals, and organizations coining their own terms.
Yep, I’d definitely agree that it’s harder. That said, this doesn’t mean that it’s not high-ev to improve on. One outcome could be that we should be more careful introducing names, as it is difficult to change them. Another would be to work to attempt to have formal ways of changing them after, even though it is difficult (It would be worthwhile in some cases, I assume).
In a recent thread about changing the name of Solstice to Solstice Advent, Oliver Habryka estimated it would cost at least $100,000 to make that happen. This seems like a reasonable estimate to me, and a good lower bound for how much value you could get from a name change to make it worth it
The idea of lowering this cost is quite appealing, but I’m not sure how to make a significant difference there.
I think it’s also worth thinking about the counterfactual cost of discouraging naming things.
Namespace pollution and name collision are two great concepts in computer programming. They way they are handled in many academic environments seems quite naive to me.
Programs can get quite large and thus naming things well is surprisingly important. Many of my code reviews are primarily about coming up with good names for things. In a large codebase, every time symbolicGenerator() is mentioned, it refers to the same exact thing. If after one part of the codebase has been using symbolicGenerator for a reasonable set of functions, and later another part comes up, and it’s programmer realizes that symbolicGenerator is also the best name for that piece, they have to make a tough decision. Either they could refactor the codebase to change all previous mentions of symbolicGenerator to use an alternative name, or they have to come up with an alternative name. They can’t have it both ways.
Therefore, naming becomes a political process. Names touch many programmers who have different intuitions and preferences. A large refactor of naming in a section of the codebase that others use would often be taken quite hesitantly by that group.
This makes it all the more important that good names are used initially. As such, reviewers care a lot about the names being pretty good; hopefully they are generic enough so that their components could be expanded while the name remains meaningful; but specific enough to be useful for remembering. Names that get submitted via pull requests represent much of the human part of the interface/API; they’re harder to change later on, so obviously require extra work to get right the first time.
To be clear, a name collision is when two unrelated variables have the same name, and namespace pollution refers to when code is initially submitted in ways that are likely to create unnecessary conflicts later on.
Academia
My impression is that in much of academia, there are few formal processes for groups of experts to agree on the names for things. There are specific clusters with very highly thought out terminology, particularly around very large sets of related terminology; for instance, biological taxonomies, the metric system, and various aspects of medicine and biology.
By in many other parts, it seems like a free-for-all among the elite. My model of the process is something like, “Someone coming up with a new theory will propose a name for it and put it in their paper. If the paper is accepted (which is typically done with details in mind unrelated to the name), and if others find that theory useful, then they will generally call it the same name as the one used in the proposal. In some cases a few researchers will come up with a few variations for the same idea, in which case one will be selected through the process of what future researchers decide to use, on an individual bases. Often ideas are named after those who came up with them to some capacity; this makes a lot of sense to other experts who worked in these areas, but it’s not at all obvious if this is optimal for other people.”
The result is that naming is something that happens almost accidentally, as the result of a processes which isn’t paying particular attention to making sure the names are right.
When there’s little or no naming processes, than actors are incentivized to chose bold names. They don’t have to pay the cost for any namespace pollution they create. Two names that come to mind recently have been “The Orthogonality Thesis” or “The Simulation Hypothesis*. These are two rather specific things with very generic names. Those come to mind because they are related to our field, but many academic topics seem similar. Information theory is mostly about encoding schemes, which are now not that important. Systems theory is typically about a subset of dynamical systems. But of course, it would be really awkward for anyone else with a more sensible “Systems theory” to use that name for the new thing.
I feel like AI has had some noticeable bad examples; It’s hard to look at all the existing naming and think that this was the result of a systematic and robust naming approach. The Table of Contents of AI A Modern Approach seems quite good to me; that seems very much the case of a few people refactoring things to come up with one high-level overview that is optimized for being such. But the individual parts are obviously messy. A* search, alpha-beta pruning, K-consistency, Gibbs sampling, Dempster-shafer theory, etc.
LessWrong
One of my issues with LessWrong is the naming system. There’s by now quite a bit of terminology to understand; the LessWrong wiki seems useful here. But there’s no strong process from what I understand. People suggest names in their posts, these either become popular or don’t. There’s rarely any refactoring.
I’m not sure if it’s good or bad, but I find the way species get named interesting.
The general rule is “first published name wins”, and this is true even if the first published name is “wrong” in some way, like implies a relationship that doesn’t exist, since that implication is not officially semantically meaningful. But there are ways to get around this, like if a name was based on a disproved phylogeny, in which case a new name can be taken up that fits the new phylogenic relationship. This means existing names get to stick, at least up until the time that they are proven so wrong that they must be replaced. Alas, there’s no official registry of these things, so it’s up to working researchers to do literature reviews and get the names right, and sometimes people get it wrong by accident and sometimes on purpose because they think an earlier naming is “invalid” for one reason or another and so only recognize a later naming. The result is pretty confusing and requires knowing a lot or doing a lot of research to realize that, for example, two species names might refer to the same species in different papers.
Thanks, I didn’t know. That matches what I expect from similar fields, though it is a bit disheartening. There’s an entire field of library science and taxonomy, but they seem rather isolated to specific things.
Another quick note on the LessWrong wiki:
I’m skeptical of single definitions without disclaimers. I think it’s misleading (to some) that “Truth is the correspondence between and one’s beliefs about reality and reality. “[1]. Rather, it’s fair to say that this is one specific definition of truth that has been used in many cases; I’m sure that others, including others on LessWrong, have used it differently.
Most dictionaries have multiple definitions for words. This seems more like what we should aim for.
In fairness, when I searched for “Rationality”, the result states, “Rationality is something of fundamental importance to LessWrong that is defined in many ways”, which I of course agree with.
[1] https://wiki.lesswrong.com/wiki/Truth
At the meta-level it isn’t clear what value other definitions might offer (in this case). (“Truth” seems like a basic concept that is understood prior to reading that article—it’s easier to imagine such an argument for other concepts without such wide understanding.)
Perhaps more definitions should be brought in (as necessary), with the same level of attention to detail -
when they are used (extensively). It’s possible that relevant posts have already been made, they just haven’t been integrated into the wiki. Is the wiki up to date as of 2012, but not after that?
Footnote not found. The refactoring sounds like a good idea, though the main difficulty would be propagating the new names.
Thanks for point that out! I forgot the specific note, removed the [1].
I definitely would agree that refactoring would be difficult, especially if we haven’t figured out a great refactoring process.
One of the issues with this in both an academic and LW context is that changing the name of something in a single source of truth codebase is much cheaper than changing the name of something in a community. The more popular an idea, the more cost goes up to change the name. Similarly, when you’re working with a single organization, creating a process that everyone follows is relatively cheap compared to a loosely tied together community with various blogs, individuals, and organizations coining their own terms.
Yep, I’d definitely agree that it’s harder. That said, this doesn’t mean that it’s not high-ev to improve on. One outcome could be that we should be more careful introducing names, as it is difficult to change them. Another would be to work to attempt to have formal ways of changing them after, even though it is difficult (It would be worthwhile in some cases, I assume).
In a recent thread about changing the name of Solstice to Solstice Advent, Oliver Habryka estimated it would cost at least $100,000 to make that happen. This seems like a reasonable estimate to me, and a good lower bound for how much value you could get from a name change to make it worth it
The idea of lowering this cost is quite appealing, but I’m not sure how to make a significant difference there.
I think it’s also worth thinking about the counterfactual cost of discouraging naming things.
As an example, here’s a post with an important concept that hasn’t really spread because it doesn’t have a snappy name: https://www.lesswrong.com/posts/K4eDzqS2rbcBDsCLZ/unrolling-social-metacognition-three-levels-of-meta-are-not