When Are Circular Definitions A Problem?

Disclaimer: if you are using a definition in a nonmathematical piece of writing, you are probably making a mistake; you should just get rid of the definition and instead use a few examples. This applies double to people who think they are being “rigorous” by defining things but are not actually doing any math. Nonetheless, definitions are still useful and necessary when one is ready to do math, and some pre-formal conceptual work is often needed to figure out which mathematical definitions to use; thus the usefulness of this post.

Suppose I’m negotiating with a landlord about a pet, and in the process I ask the landlord what counts as a “big dog”. The landlord replies “Well, any dog that’s not small”. I ask what counts as a “small dog”. The landlord replies “Any dog that’s not big”.

Obviously this is “not a proper definition”, in some sense. If that actually happened in real life, presumably the landlord would say it somewhat tongue-in-cheek. But what exactly is wrong with defining big dogs as not small, and small dogs as not big?

One might be tempted to say “It’s a circular definition!”, with the understanding that circular definitions are always problematic in some way.

But then consider another example, this time mathematical:

  • Define x as a real number equal to y-1: x = y-1

  • Define y as a real number equal to x/​2: y = x/​2

These definitions are circular! I’ve defined x in terms of y, and y in terms of x. And yet, it’s totally fine; a little algebra shows that we’ve defined x = −2 and y = −1. We do this thing all the time when using math, and it works great in practice.

So clearly circular definitions are not inherently problematic. When are they problematic?

We could easily modify the math example to make a problematic definition:

  • Define x as a real number equal to y-1: x=y-1

  • Define y as a real number equal to x+1: y=x+1

What’s wrong with this definition? Well, the two equations—the two definitions—are redundant; they both tell us the same thing. So together, they’re insufficient to fully specify x and y. Given the two (really one) definitions, x and y remain extremely underdetermined; either one could be any real number!

And that’s the same problem we see in the big dog/​small dog example: if I define a big dog as not small, and a small dog as not big, then my two definitions are redundant. Together, they’re insufficient to tell me which dogs are or are not big. Given the two (really one) definitions, big dog and small dog remain extremely underdetermined; any dog could be big or small!

Application: Clustering

This post was originally motivated by a comment thread about circular definitions in clustering:

  • Define the points in cluster i as those which statistically look like they’re generated from the parameters of cluster i

  • Define the parameters of cluster i as an average of <some features> of points in cluster i

These definitions are circular: we define cluster-membership of points based on cluster parameters, and cluster parameters based on cluster-membership of points.

In a typical EM-style clustering algorithm, the point colors (blue/​red) might be assigned based on which circle each point fits best, and the circles might be calculated to best fit the points of the same color. Note the circularity: cluster assignments (color) are a function of data and parameters (the circles), while parameters are a function of data and cluster assignments.

And yet, widely-used EM clustering algorithms are essentially iterative solvers for equations which express basically the two definitions above. They work great in practice. While they don’t necessarily fully specify one unique solution, for almost all data sets they at least give locally unique solutions, which is often all we need (underdetermination between a small finite set of possibilities is often fine, it’s when definitions allow for a whole continuum that we’re really in trouble).

Circularity in clustering is particularly important, insofar as we buy that words point to clusters in thingspace. If words typically point to clusters in thingspace, and clusters are naturally defined circularly, then the most natural definitions will typically involve some circularity. The key is to make sure that the circular definitions used are nondegenerate—i.e. if we were to turn the definitions into equations, the equations would not be redundant. So long as the definitions are nondegenerate, and there’s a definition for each of the “unknowns” involved (e.g. parameters and cluster labels, in the clustering case), the equations will typically have at least locally unique solutions (since number of equations matches number of unknowns). That’s what we really care about: definitions which aren’t too underdetermined.