Warning: this post is moderately technical and very theoretical.
Problem 1: there are five apples on a plate. If you and your friend each eat an apple from the plate, how many apples are left?
Problem 2: what is 5 − 1 − 1?
Word problems are harder than number problems. (Source: every elementary school ever.) This isn’t just true of elementary school students; writing a program which can solve problem 2 is much easier than writing a program which can solve problem 1. Why?
Applying math in the real world requires a recognition/pattern-matching step. We have to notice that some structure in the real world can be represented by mathematical objects, like numbers or functions. In many (most?) cases, that’s the hardest step—figuring out how to translate our own internal world-models into math is more difficult than the math itself.
Unfortunately, the way math is done today often makes the problem harder rather than easier. We usually use definitions and axioms optimized for brevity and elegant proofs, not optimized for ease of recognition in the real world. This post asks: what if our mathematical foundations were optimized from the start for ease of recognition in the hands of AI? In particular, we’ll use the definition of numbers as an example.
Example: Natural Numbers
The usual way mathematicians construct numbers from first principles goes something like this. We have a number 0, and every number x has a “successor” S(x). So, we have numbers 0, S(0), S(S(0)), S(S(S(0))), … and we usually use the shorthand notation 1 = S(0), 2 = S(S(0)), 3 = S(S(S(0))), etc. A few more axioms guarantee that these numbers are all distinct (i.e. we don’t have S(S(S(0))) = 0) and there aren’t any “extra” numbers (i.e. numbers which can’t be reached by counting up from 0). There’s various different flavors of this construction in different areas—e.g. we can perform the construction starting from set theory, or from lambda calculus, or from category theory—but they all do basically the same thing.
In some sense, this construction seems to capture the essence of the principle of induction: if a statement like “x is bamfoozled” is true for x = 0, and “x is bamfoozled” implies “S(x) is bamfoozled” (a.k.a. “x+1 is bamfoozled”), then all numbers are bamfoozled. The successor construction provides the minimum framework needed for the principle of induction to work. From a pattern-recognition standpoint: if our goal were to recognize anywhere the principle of induction can be applied, then the successor construction provides a pretty good set of criteria to look for.
Alas, most applications of numbers do not involve induction in any obvious way—not even implicitly. If I say “there are three apples on a plate”, how does that involve induction? Sure, we could formulate it in inductive terms: I could imagine that I started with an empty plate, then put an apple on the plate, then put another apple on the plate, then put another apple on the plate, and wound up with my actual three-apple plate. But I don’t think humans actually think about “three apples on a plate” that way, and I don’t think that would be a very useful design for an AI thinking about “three apples on a plate”.
More generally: any application of natural numbers can be formulated in successor terms. There’s always some isomorphism between any-other-number-formulation and the successor formulation. But when we use the successor formulation, most of the work is in finding those isomorphisms. Instead, we’d like to load more of the work into the definition of numbers. Make the definition do most of the pattern-matching work, so we don’t need complicated case-by-case isomorphisms.
What would that look like?
An Instructive Wrong Answer
First pass: our “definition” should be some kind of pattern-template—conceptually similar to a regex. The user (whether human or AI) would match that template against their world-model to recognize numbers. Exactly what it looks like would depend on the kind of world-model we want to match the template against.
As a very toy example, imagine that our “world-model” is just a string, and we want to define the number 3 as a (python) regex. A natural candidate would be
(?P<object>.+)(?P=object)(?P=object)
This regex will match any string which contains (at least) three nonoverlapping copies of the same substring. Each different way of picking three identical “objects” (i.e. nonoverlapping substrings) out of the “world-model” will produce a different match—i.e. a different “three copies of an object”.
That’s a decent definition of “three” in the context of simple string-worlds. But to define “numbers”—i.e. all the numbers—we’d either have to specify each number individually, or use something like the successor construction to specify how-to-construct each number. Neither of those quite seems to capture the underlying idea of “numbers”. In any particular application of numbers (i.e. numbers in the abstract, not just a particular number like three), we’d likely need complicated isomorphisms from our-definition-of-numbers to something-in-our-world-model. For instance, what if we want to talk about general theorems about prime numbers? To do that, we need a general definition of “numbers”, not just case-by-case patterns for each individual number.
To put it differently: the question of “does the number three appear in this situation?” is different from “are numbers relevant in this situation?”. Our regex above would be an ok (though not spectacular) first pass at the former question, but it doesn’t really do any good for the latter question.
A Better (But Still Not Great) Answer
Let’s change up the question a bit. The key problem is not to recognize any particular number in the wild, but rather to recognize situations where “using numbers” is potentially useful at all.
Rather than pattern-matching against a world model, we’ll pattern-match against a query on a world-model. Rather than looking at apples on a plate and recognizing that there’s three of them, we take some question about apples on a plate (e.g. “will I need to buy more apples before next Thursday?”) and recognize that the number of apples is useful for answering that question.
Here’s one approach: we have some query-function f which depends on some variables x, y, z, … in the world model, and is invariant under reordering the variables—i.e. f(x, y, z, …) = f(z, y, x, …) = f(y, x, …, z) and so forth. That’s the “pattern” which we look for: our query-function does not care about the order of the variables.
How does this lead to numbers? Well, imagine for a moment that the variables are all binary, so our query is something like f(1,1,0,0,0,0,0,1). But we can re-order the inputs without changing the result—so, in particular, we can sort the inputs: f(1,1,0,0,0,0,0,1) = f(0,0,0,0,0,1,1,1). Any inputs which are the same when sorted produce the same answer. In other words, we get the same answer whenever the number of 0’s and number of 1’s is the same. And we can extend this beyond binary: pick any sort-order for the possible variable values, then sort them, and we find that the query-result depends only on how many of each value appears.
Formally: we can define a function COUNTS which counts how many times each value appears in its input. For instance, COUNTS(“yellow”, “yellow”, 0, “yellow”, “cat”) = {0: 1, “yellow”: 3, “cat”: 1}. Then, given any function f(x, y, z, …) which is invariant under reordering its inputs, we can write f(x, y, z, …) = g(COUNTS(x, y, z, …)) for some function g.
To use this to construct numbers, we flip the whole thing around: we define COUNTS as:
a function for which any f(x, y, z, …) invariant under reordering inputs can be written as f(x, y, z, …) = g(COUNTS(x, y, z, …)) for some g
… with the constraint that COUNTS itself should be invariant under reordering its inputs
The second piece guarantees that COUNTS is unique up to isomorphism. (If you’ve studied category theory, this definition should set off some pattern-matchers.)
At this point I’m going to hand-wave. There’s still some nontrivial steps to go from COUNTS to individual numbers, but I think we’ve passed the insights relevant to the point I’m trying to make here. So: “numbers” are <handwave> constructed from equivalence classes induced by COUNTS </handwave>.
What Does This Buy Us?
This definition immediately buys us a very general, powerful pattern-matcher for recognizing situations where numbers might be useful. Intuitively: numbers are useful when we can “swap around” the inputs without changing the answer.
In the apples-on-a-plate example, we might ask a question like “Will I need to buy more apples before next Thursday?”. The answer to this question probably does not change if we swap one of the apples with another, or move them around on the plate so that an empty space is swapped with an apple. So, we expect that the answer depends only on the number of apples on the plate, not on the details of each apple or how they’re arranged. (Or, more generally, the number of each identical-for-purposes-of-the-question apple-type on the plate; e.g. we might want to distinguish small apples from large.)
As we construct more mathematical operations on top of this idea, we gain additional useful pattern-matchers. For instance, here’s a natural way to think about addition: if a function is invariant not only to reordering its inputs, but also to switching any inputs with value “cat” to 2 (or vice versa), then the function depends only on the sum of the number-of-”cat”-inputs and the number-of-2-inputs. Then, when we prove theorems about addition (e.g. x+y = y+x), those theorems automatically apply to a broad range of “queries” without needing any further work to figure out exactly how our definitions map to the situation at hand.
That said, I still think this construction leaves a lot to be desired. In the apples-on-a-plate example, we can think about swapping an apple with a space-where-an-apple-could-be, but what about moving the apples around more generally? It’s not like plates come with designated apple-slots. It feels like there is a still-more-general pattern for recognizing situations where numbers apply, which would give a still-more-general definition of numbers.
Ideally, we’d find some pattern which provably captures all the possible situations where numbers can show up, without any complicated isomorphisms needed. On the other hand, that’s the sort of problem which sounds very likely to run into uncomputability/undecidability problems.
Summary
Mathematical definitions are usually not optimized for ease of recognizing real-world situations where the concepts apply. Word problems are harder than number problems.
This post talked about defining numbers in a way which would make it easier to automatically recognize situations where numbers apply. In particular, we saw that numbers apply whenever the answer to some question does not depend on swapping around input variables. There are likely more general patterns which could provide even better definitions, but this already buys us a far more widely applicable pattern-matcher than the usual successor construction of numbers.
One final note: this post talked mainly about the kinds of pattern-matchers one could automate, with an eye toward AI. There’s also plenty to be said about formulating mathematical definitions to make it easy for humans to recognize when the concepts apply. For instance, Category Theory Without The Baggage is my own attempt to do this for some building blocks of category theory. More generally, I think broader adoption of a lot of higher math today is limited by unnecessarily abstract, non-intuitive definitions. (Zooming out even further, this is an instance of Interfaces as a Scarce Resource.) There’s probably gains to be had from reformulating these things in ways which make it easier to recognize applications. Even within pure math, the ability to recognize novel ways to use existing theory in different subfields is quite powerful.
Thankyou to Abram for starting a conversation which led to me writing this post.
Recognizing Numbers
Warning: this post is moderately technical and very theoretical.
Problem 1: there are five apples on a plate. If you and your friend each eat an apple from the plate, how many apples are left?
Problem 2: what is 5 − 1 − 1?
Word problems are harder than number problems. (Source: every elementary school ever.) This isn’t just true of elementary school students; writing a program which can solve problem 2 is much easier than writing a program which can solve problem 1. Why?
Applying math in the real world requires a recognition/pattern-matching step. We have to notice that some structure in the real world can be represented by mathematical objects, like numbers or functions. In many (most?) cases, that’s the hardest step—figuring out how to translate our own internal world-models into math is more difficult than the math itself.
Unfortunately, the way math is done today often makes the problem harder rather than easier. We usually use definitions and axioms optimized for brevity and elegant proofs, not optimized for ease of recognition in the real world. This post asks: what if our mathematical foundations were optimized from the start for ease of recognition in the hands of AI? In particular, we’ll use the definition of numbers as an example.
Example: Natural Numbers
The usual way mathematicians construct numbers from first principles goes something like this. We have a number 0, and every number x has a “successor” S(x). So, we have numbers 0, S(0), S(S(0)), S(S(S(0))), … and we usually use the shorthand notation 1 = S(0), 2 = S(S(0)), 3 = S(S(S(0))), etc. A few more axioms guarantee that these numbers are all distinct (i.e. we don’t have S(S(S(0))) = 0) and there aren’t any “extra” numbers (i.e. numbers which can’t be reached by counting up from 0). There’s various different flavors of this construction in different areas—e.g. we can perform the construction starting from set theory, or from lambda calculus, or from category theory—but they all do basically the same thing.
In some sense, this construction seems to capture the essence of the principle of induction: if a statement like “x is bamfoozled” is true for x = 0, and “x is bamfoozled” implies “S(x) is bamfoozled” (a.k.a. “x+1 is bamfoozled”), then all numbers are bamfoozled. The successor construction provides the minimum framework needed for the principle of induction to work. From a pattern-recognition standpoint: if our goal were to recognize anywhere the principle of induction can be applied, then the successor construction provides a pretty good set of criteria to look for.
Alas, most applications of numbers do not involve induction in any obvious way—not even implicitly. If I say “there are three apples on a plate”, how does that involve induction? Sure, we could formulate it in inductive terms: I could imagine that I started with an empty plate, then put an apple on the plate, then put another apple on the plate, then put another apple on the plate, and wound up with my actual three-apple plate. But I don’t think humans actually think about “three apples on a plate” that way, and I don’t think that would be a very useful design for an AI thinking about “three apples on a plate”.
More generally: any application of natural numbers can be formulated in successor terms. There’s always some isomorphism between any-other-number-formulation and the successor formulation. But when we use the successor formulation, most of the work is in finding those isomorphisms. Instead, we’d like to load more of the work into the definition of numbers. Make the definition do most of the pattern-matching work, so we don’t need complicated case-by-case isomorphisms.
What would that look like?
An Instructive Wrong Answer
First pass: our “definition” should be some kind of pattern-template—conceptually similar to a regex. The user (whether human or AI) would match that template against their world-model to recognize numbers. Exactly what it looks like would depend on the kind of world-model we want to match the template against.
As a very toy example, imagine that our “world-model” is just a string, and we want to define the number 3 as a (python) regex. A natural candidate would be
This regex will match any string which contains (at least) three nonoverlapping copies of the same substring. Each different way of picking three identical “objects” (i.e. nonoverlapping substrings) out of the “world-model” will produce a different match—i.e. a different “three copies of an object”.
That’s a decent definition of “three” in the context of simple string-worlds. But to define “numbers”—i.e. all the numbers—we’d either have to specify each number individually, or use something like the successor construction to specify how-to-construct each number. Neither of those quite seems to capture the underlying idea of “numbers”. In any particular application of numbers (i.e. numbers in the abstract, not just a particular number like three), we’d likely need complicated isomorphisms from our-definition-of-numbers to something-in-our-world-model. For instance, what if we want to talk about general theorems about prime numbers? To do that, we need a general definition of “numbers”, not just case-by-case patterns for each individual number.
To put it differently: the question of “does the number three appear in this situation?” is different from “are numbers relevant in this situation?”. Our regex above would be an ok (though not spectacular) first pass at the former question, but it doesn’t really do any good for the latter question.
A Better (But Still Not Great) Answer
Let’s change up the question a bit. The key problem is not to recognize any particular number in the wild, but rather to recognize situations where “using numbers” is potentially useful at all.
Rather than pattern-matching against a world model, we’ll pattern-match against a query on a world-model. Rather than looking at apples on a plate and recognizing that there’s three of them, we take some question about apples on a plate (e.g. “will I need to buy more apples before next Thursday?”) and recognize that the number of apples is useful for answering that question.
Here’s one approach: we have some query-function f which depends on some variables x, y, z, … in the world model, and is invariant under reordering the variables—i.e. f(x, y, z, …) = f(z, y, x, …) = f(y, x, …, z) and so forth. That’s the “pattern” which we look for: our query-function does not care about the order of the variables.
How does this lead to numbers? Well, imagine for a moment that the variables are all binary, so our query is something like f(1,1,0,0,0,0,0,1). But we can re-order the inputs without changing the result—so, in particular, we can sort the inputs: f(1,1,0,0,0,0,0,1) = f(0,0,0,0,0,1,1,1). Any inputs which are the same when sorted produce the same answer. In other words, we get the same answer whenever the number of 0’s and number of 1’s is the same. And we can extend this beyond binary: pick any sort-order for the possible variable values, then sort them, and we find that the query-result depends only on how many of each value appears.
Formally: we can define a function COUNTS which counts how many times each value appears in its input. For instance, COUNTS(“yellow”, “yellow”, 0, “yellow”, “cat”) = {0: 1, “yellow”: 3, “cat”: 1}. Then, given any function f(x, y, z, …) which is invariant under reordering its inputs, we can write f(x, y, z, …) = g(COUNTS(x, y, z, …)) for some function g.
To use this to construct numbers, we flip the whole thing around: we define COUNTS as:
a function for which any f(x, y, z, …) invariant under reordering inputs can be written as f(x, y, z, …) = g(COUNTS(x, y, z, …)) for some g
… with the constraint that COUNTS itself should be invariant under reordering its inputs
The second piece guarantees that COUNTS is unique up to isomorphism. (If you’ve studied category theory, this definition should set off some pattern-matchers.)
At this point I’m going to hand-wave. There’s still some nontrivial steps to go from COUNTS to individual numbers, but I think we’ve passed the insights relevant to the point I’m trying to make here. So: “numbers” are <handwave> constructed from equivalence classes induced by COUNTS </handwave>.
What Does This Buy Us?
This definition immediately buys us a very general, powerful pattern-matcher for recognizing situations where numbers might be useful. Intuitively: numbers are useful when we can “swap around” the inputs without changing the answer.
In the apples-on-a-plate example, we might ask a question like “Will I need to buy more apples before next Thursday?”. The answer to this question probably does not change if we swap one of the apples with another, or move them around on the plate so that an empty space is swapped with an apple. So, we expect that the answer depends only on the number of apples on the plate, not on the details of each apple or how they’re arranged. (Or, more generally, the number of each identical-for-purposes-of-the-question apple-type on the plate; e.g. we might want to distinguish small apples from large.)
As we construct more mathematical operations on top of this idea, we gain additional useful pattern-matchers. For instance, here’s a natural way to think about addition: if a function is invariant not only to reordering its inputs, but also to switching any inputs with value “cat” to 2 (or vice versa), then the function depends only on the sum of the number-of-”cat”-inputs and the number-of-2-inputs. Then, when we prove theorems about addition (e.g. x+y = y+x), those theorems automatically apply to a broad range of “queries” without needing any further work to figure out exactly how our definitions map to the situation at hand.
That said, I still think this construction leaves a lot to be desired. In the apples-on-a-plate example, we can think about swapping an apple with a space-where-an-apple-could-be, but what about moving the apples around more generally? It’s not like plates come with designated apple-slots. It feels like there is a still-more-general pattern for recognizing situations where numbers apply, which would give a still-more-general definition of numbers.
Ideally, we’d find some pattern which provably captures all the possible situations where numbers can show up, without any complicated isomorphisms needed. On the other hand, that’s the sort of problem which sounds very likely to run into uncomputability/undecidability problems.
Summary
Mathematical definitions are usually not optimized for ease of recognizing real-world situations where the concepts apply. Word problems are harder than number problems.
This post talked about defining numbers in a way which would make it easier to automatically recognize situations where numbers apply. In particular, we saw that numbers apply whenever the answer to some question does not depend on swapping around input variables. There are likely more general patterns which could provide even better definitions, but this already buys us a far more widely applicable pattern-matcher than the usual successor construction of numbers.
One final note: this post talked mainly about the kinds of pattern-matchers one could automate, with an eye toward AI. There’s also plenty to be said about formulating mathematical definitions to make it easy for humans to recognize when the concepts apply. For instance, Category Theory Without The Baggage is my own attempt to do this for some building blocks of category theory. More generally, I think broader adoption of a lot of higher math today is limited by unnecessarily abstract, non-intuitive definitions. (Zooming out even further, this is an instance of Interfaces as a Scarce Resource.) There’s probably gains to be had from reformulating these things in ways which make it easier to recognize applications. Even within pure math, the ability to recognize novel ways to use existing theory in different subfields is quite powerful.
Thankyou to Abram for starting a conversation which led to me writing this post.