When Are Circular Definitions A Problem?
Disclaimer: if you are using a definition in a nonmathematical piece of writing, you are probably making a mistake; you should just get rid of the definition and instead use a few examples. This applies double to people who think they are being “rigorous” by defining things but are not actually doing any math. Nonetheless, definitions are still useful and necessary when one is ready to do math, and some pre-formal conceptual work is often needed to figure out which mathematical definitions to use; thus the usefulness of this post.
Suppose I’m negotiating with a landlord about a pet, and in the process I ask the landlord what counts as a “big dog”. The landlord replies “Well, any dog that’s not small”. I ask what counts as a “small dog”. The landlord replies “Any dog that’s not big”.
Obviously this is “not a proper definition”, in some sense. If that actually happened in real life, presumably the landlord would say it somewhat tongue-in-cheek. But what exactly is wrong with defining big dogs as not small, and small dogs as not big?
One might be tempted to say “It’s a circular definition!”, with the understanding that circular definitions are always problematic in some way.
But then consider another example, this time mathematical:
Define x as a real number equal to y-1: x = y-1
Define y as a real number equal to x/2: y = x/2
These definitions are circular! I’ve defined x in terms of y, and y in terms of x. And yet, it’s totally fine; a little algebra shows that we’ve defined x = −2 and y = −1. We do this thing all the time when using math, and it works great in practice.
So clearly circular definitions are not inherently problematic. When are they problematic?
We could easily modify the math example to make a problematic definition:
Define x as a real number equal to y-1: x=y-1
Define y as a real number equal to x+1: y=x+1
What’s wrong with this definition? Well, the two equations—the two definitions—are redundant; they both tell us the same thing. So together, they’re insufficient to fully specify x and y. Given the two (really one) definitions, x and y remain extremely underdetermined; either one could be any real number!
And that’s the same problem we see in the big dog/small dog example: if I define a big dog as not small, and a small dog as not big, then my two definitions are redundant. Together, they’re insufficient to tell me which dogs are or are not big. Given the two (really one) definitions, big dog and small dog remain extremely underdetermined; any dog could be big or small!
Application: Clustering
This post was originally motivated by a comment thread about circular definitions in clustering:
Define the points in cluster i as those which statistically look like they’re generated from the parameters of cluster i
Define the parameters of cluster i as an average of <some features> of points in cluster i
These definitions are circular: we define cluster-membership of points based on cluster parameters, and cluster parameters based on cluster-membership of points.
And yet, widely-used EM clustering algorithms are essentially iterative solvers for equations which express basically the two definitions above. They work great in practice. While they don’t necessarily fully specify one unique solution, for almost all data sets they at least give locally unique solutions, which is often all we need (underdetermination between a small finite set of possibilities is often fine, it’s when definitions allow for a whole continuum that we’re really in trouble).
Circularity in clustering is particularly important, insofar as we buy that words point to clusters in thingspace. If words typically point to clusters in thingspace, and clusters are naturally defined circularly, then the most natural definitions will typically involve some circularity. The key is to make sure that the circular definitions used are nondegenerate—i.e. if we were to turn the definitions into equations, the equations would not be redundant. So long as the definitions are nondegenerate, and there’s a definition for each of the “unknowns” involved (e.g. parameters and cluster labels, in the clustering case), the equations will typically have at least locally unique solutions (since number of equations matches number of unknowns). That’s what we really care about: definitions which aren’t too underdetermined.
A dictionary defines all words circularly, but of course nobody learns all words from a dictionary—the assumption is you’re looking up a small number of words you don’t know.
Humans learn their first few words by seeing how they’re used in relation to objects, and the rest can be derived from there without needing circularity.
However the dictionary provides very tight constraints on what words can mean. Whatever the words “wood”, “is”, “made”, “from”, and “trees” mean, the sentence “wood is made from trees” must be true. The vast majority of all possible meanings fail this. Using only circular definitions, is it possible to constraint words meanings so tightly that there’s only one possible model which fits those constraints?
LLMs seem to provide a resounding yes to that question. Whilst 1st generation LLMs only ever saw text and had no hard coded knowledge, so could only possibly figure out what words meant based on how they’re used in relation to other words, they understood the meaning of words sufficiently well to reason about the physical properties of the objects they represented.
Isn’t this sort-of what all formal mathematical systems do? You start with some axioms that define how your atoms must relate to each other, and (in a good system) those axioms pin the concepts down well enough that you can start proving a bunch of theorems about them.
The intended question, I think, is if you were to find a dictionary for some alien language (not a translators dictionary, but a dictionary for people who speak that language to look up definitions of words), can you translate most of the dictionary to English? What if you additionally had access to large amounts of conversations in that language, without any indication of what the aliens were looking at/doing at the time of the conversation?
“wood is made from trees”
“trees are made of wood”
A new circularity!
In both examples: 2 degrees of freedom, two pieces of information. The information is sufficient to restrict one of the degrees of freedom (to within some bound in the second clustering example rather than precise).
I agree with almost everything in this post, except that (ironically) I think it draws too narrow a boundary around the concept of “mathematics.” I do very little formal mathematics, but use mathematical styles of reasoning very often, to good effect. To my understanding, math is the study of patterns, and to point other people at useful patterns, definitions can be a valuable starting point. This is especially true if you explicitly point out that the definition is approximate or fuzzy. If you’re trying to inform or educate or advise people, then you need to do it (in part) with words, and you’ll need to give enough definition (with examples, yes, but not only with examples) to get the process started.
What you shouldn’t do is use definitions to debate someone else when they have a good underlying point.
That said, there have also been a few times in my career where the most valuable thing I’ve been able to observe is, “this word shouldn’t exist because it doesn’t refer to a natural category,” or “people stop using this word to describe a thing when the thing starts working properly, so they always think things in the category don’t work.” My main personal examples of these are smart materials, metamaterials, and nanotech. There is a useful underlying concept in each case, but real-world usage can be so inconsistent that it needs definition at the start of any conversation for the conversation to be useful.
Circular Definitions are problem if the set of problems contain circular definitions.
This is kind of tangential but:
I think one problem with using mathematical definitions as an analogy is that first-order logic is complete, so giving a unique definition is sufficient to tell you the relevant properties. This doesn’t hold for informal definitions, and so this makes unique description less helpful as a proxy.
(Or well, realistically you could also have counterproductive mathematical definitions which only turn out to be related to the central properties you’re trying to get at through a long string of logic, but you don’t see that as often as you do for informal definitions.)
In contrast, consider my definition of a table here. I focus not so much on uniquely characterizing what is a table or not so much as on bringing the central point of the concept of a “table” up.
That inference seems questionable, though I’m not sure what you mean with “irrelevant properties”. (Actually, in first-order logic many concepts can’t be defined, e.g. “natural number”, because we can’t express ”… and nothing else is a natural number.” Another example is “power set”. Yudkowsky has written about this.)
Forgot to say, for first-order logic it doesn’t matter what properties are considered relevant because Gödel’s completeness theorem tells you that it allows you to infer all the true properties.
What do you mean with “all the true properties”?
The properties that hold in all models of the theory.
That is, in logic, propositions are usually interpreted to be about some object, called the model. To pin down a model, you take some known facts about that model as axioms.
Logic then allows you to derive additional propositions which are true of all the objects satisfying the initial axioms, and first-order logic is complete in the sense that if some proposition is true for all models of the axioms then it is provable in the logic.
I’m still not sure what you want to say. It’s a necessary property of natural numbers that they can be reached from iterating the successor function. That condition can’t be expressed in first-order logic, so it can’t be proved and it holds in some models and in others it doesn’t. It’s like trying to define “cat” by stating that it’s an animal. This is not a sufficient definition.
You’re the one who brought up the natural numbers, I’m just saying they’re not relevant to the discussion because they don’t satisfy the uniqueness thing that OP was talking about.
In these examples, the issue is that you can’t get a computable set of axioms which uniquely pin down what you mean by natural numbers/power set, rather than permitting multiple inequivalent objects.