leogao comments on leogao’s Shortform

leogao 5 Nov 2023 0:39 UTC
4 points
an interesting fact that I notice is that in domains where there are are a lot of objects in consideration, those objects have some structure so that they can be classified, and how often those objects occur follows a power law or something, there are two very different frames that get used to think about that domain:
- a bucket of atomic, structureless objects with unique properties where facts about one object don’t really generalize at all to any other object
- a systematized, hierarchy or composition of properties or “periodic table” or full grid or objects defined by the properties they have in some framework
and a lot of interesting things happen when these collide or cooccur, or when shifting from one to the other

I know my description above is really abstract, so here are a bunch of concrete examples that all gesture at the same vibe:

basically all languages have systematic rules in general but special cases around the words that people use very often. this happens too often in unrelated languages to be a coincidence, and as a native/fluent speaker it always feels very natural but as a language learner it’s very confusing. for example, for languages with conjugations, a few of the most common verbs are almost always irregular. e.g [to be, am, is, are, was, were] (english), [sein, bin, ist, war, sind] (german), [être, suis, est, était, sont] (french); small counting numbers are often irregular [first, second, third], [两个], [premier], [ひとつ、ふたつ、みっつ]. my theory for why this makes sense to natives but not to language learners is that language learners learn things systematically from the beginning, and in particular don’t deal with the true distribution of language usage but rather an artificially flat one designed to capture all the language features roughly equally.

often, when there is a systematic way of naming things, the things that are most common will have special names/nicknames (eg IUPAC names vs common names). sometimes this happens because those things were discovered first before the systematization happened, and the once the systematization happens everyone is still used to the old names for some things. but also even if you start with the systematized thing, often people will create nicknames after the fact.

it often happens that we write software tools for a specific problem, and then later realize that that problem is a special case of a more general problem. often going more general is good because it means we can use the same code to do a wider range of things (which means less bugs, more code reuse, more elegant code). however, the more general/abstract code is often slightly clunkier to use for the common case, so often it makes sense to drop down a level of abstraction if the goal is to quickly hack something together.

when compressing some distribution of strings, the vast majority of the possible but unlikely strings can be stored basically verbatim with a flag and it is very easy to tell properties of the string by looking at the compressed representation; whereas for the most common strings they have to map to short strings that destroy all structure of the data without the decompressor. though note that not all the examples can be described as instances of compression exactly

sometimes, there’s friction between people who are using the systematizing and people who are doing the atomic concepts thing. the systematizer comes off as nitpicky, pedantic, and removed from reality to the atomic concepts person, and the atomic concepts person comes off as unrigorous, uncosmopolitan, and missing the big picture to the systematizer.

I think the concept of zero only being invented long after the other numbers is also an instance of this—in some sense for basic everyday usage in counting things, the existence of zero is a weird technicality, and I could imagine someone saying “well sure yes there is a number that comes before zero, but it’s not useful for anything, so it’s not worth considering”. I think a lot of math (eg abstract algebra) is the result of applying truly enormous amounts of this kind of systematizing

I think this also sort of has some handwavy analogies to superposition vs composition.
if there is an existing name for the thing I’m pointing at, I would be interested in knowing.