This is the third post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me.
My strategy for achieving a formalisation of goal-directed behaviour is to equate it with “behaviour which is well-explained in terms of goals”. So far, I have explored criteria for judging explanations in general and a structured way to decompose explanations of agent behaviour specifically.
One of those criteria was simplicity, or its dual, complexity. In order to determine whether my initial proposal for characterizing goal-directedness holds water, I need to get my hands dirty and learn some complexity theory. This post is a record of me doing that, while keeping my objective in sight. I have learned a lot of complexity theory/theories since I wrote my original post, so the present post is a major update from the somewhat naïve intuition I expressed there.
See the end for a few references not directly linked in the main text.
Setting the scene
In my initial breakdown of explanations, I explicitly assumed that an individual explanation could be assigned a value representing its complexity. Deeper investigation revealed at least four points where complexity might be introduced into an explanation. At the time, I wondered what measure of complexity I should use, expecting to be able to pick one (or several) up out of the box, and get down to the business of applying already-developed theory.
In attempting to apply the various complexity measures to my explanations as I was learning about them, I found each of them to be inadequate in one way or other. For example, Kolmogorov complexity and various branches of computational complexity explicitly involve a model of computation, such as a (choice of) Turing machine, but the explanations involved in goal-directedness have no obvious ties to computation, and any choice seems arbitrary. Moreover, a lot of the problems discussed in the various branches of complexity theory seem to be about sequences of problems or structures, whereas I am trying to attach complexity to individual explanations!
While I may end up adapting some existing theory, my advisor Adam Shimi pointed out that rather than trying to jumble existing measures together in an ad hoc way, I should try to understand the underlying principles of complexity to the point of building a measure which captures what I actually care about. That’s what I’m really aiming to do in this post.
Complexity, from the bottom up
Complexity is a word which carries multiple competing, often conflicting, intuitions and formalizations. It has a lot of baggage. The aim of this post is to extract a core technical notion of complexity which is inclusive enough to cover everything contained in the various branches of complexity theory (plus some other domains besides), in an intuitive enough way to allow me to design a notion of complexity that captures what I need for evaluating explanations. In a single sentence, the concept of complexity I’ll describe is a quantity assigned to a structure[1] that measures how many simple pieces are needed to construct it.
A word of caution to begin with: if you have a casual acquaintance with computational complexity theory, you’ll be aware of complexity classes, whose members are parameterized families of problems. I will get around to those later, but the crucial first step, which was previously missing in my understanding of complexity, is an understanding of the complexity of (solutions to) individual problems.
I should also stress that I’m examining complexity in a much broader context than is covered by any single discipline of complexity theory, but I’ll discuss some specific branches of the domain later on.
Motivating examples
Since I expect computer science to be more accessible to my audience than abstract maths, I’ll start with two motivating examples in computational terms.
Why is multiplying big numbers more difficult than multiplying small numbers? When we actually sit down and compute a product of large numbers, our go-to method is long multiplication, where we perform a series of pair-wise multiplications of individual digits (using values that are engraved into our memories by early school experiences which were traumatic for some...) and then add up the results. The more digits there are, the more digit multiplications (and additions) we must perform, and the longer the calculation takes.
It may surprise you to discover that there are algorithms for multiplication which are more efficient than long multiplication for sufficiently large numbers.[2] We don’t have to just multiply the digits of the numbers themselves; we can sometimes add and/or subtract digits before multiplying to reduce the number of basic multiplications we need to perform.
Finding better algorithms is difficult enough that the existence of a better algorithm than long multiplication was a big surprise even to the likes of Kolmogorov. One thing is clear, however: there is a smallest number of single digit multiplications and additions required to multiply an arbitrary n-digit number and an m-digit number; there is a most efficient algorithm. The size of that algorithm, in terms of the number of operations involved, is the computational complexity of the multiplication operation for numbers of these sizes.
If we found that adding digits is easier/less costly than multiplying them, then we might adjust our measure of complexity to reflect this, and this may even change which algorithm is considered the most efficient. More on that later.
Now let’s take an example of a different flavour. What is the complexity of a piece of text? For simplicity, let’s restrict ourselves to a string s consisting of just the digits 0 and 1. Here are two possible examples:
0101010101010101010101010101010101010101010101
00101011110011010001010000111010001
Before we get started, I want to stress that there is not just one answer here. For example, I could consider the basic operations for writing strings to be “write 0”[3] and “write 1″. In that context, there is no more efficient way to produce these strings than to write them out one digit at a time, and the measure of complexity that we end up with is simply string length, for which the former is the more “complex” of the two. On the other hand, if I have the comparative might of a universal Turing machine U at my disposal[4], I could consider the permissible string-generating procedures to be the inputs for that Turing machine (that is, the programs), and take the complexity to be the length of the program instead of the length of the string. The result is Kolmogorov complexity, KU(s). The value computed depends on the specific choice of U, but for many choices it’s a lot easier to program “write 01 twenty three times” than to find a short program that produces the second string above. I could also choose any programming language as a substitute for a Turing machine[5], which might make that last claim more credible.
So what is complexity?
I can extract from these examples a general idea of what complexity should be.
I begin by selecting a class of structures whose complexity I want to measure. In the examples above, these were “functions defined on pairs of numbers of a given size” and “strings (of 0s and 1s)”, respectively.
I select a generation procedure. In the first example, this was the class of algorithms which I was allowed to use (those involving just calculations in terms of individual digits). In the second, the original generating procedure was just “writing down digits”, while the second was “input into a Turing machine and consider the output”.
I select a handful of basic operations or building blocks, considering these to require a constant small cost to perform. In the first example, these were additions and multiplications of single digits. In the second, we had individual digits (or ‘the operation of writing a single digit’) in the first instance and inputting single digits into a Turing machine in the second instance. Note that for the time being, the basic operations are assumed to be sufficiently varied to be capable of exactly generating the desired output.
Given these ingredients, the corresponding complexity of a given structure is the minimum total cost of all choices of basic operations/components which generate it.
This turns out to cover all of the usual notions of complexity, of which we’ll see many in a later section. As we’ve already seen, it also includes some simpler measures, such as size. As a general rule of thumb, I’ve found that the more powerful the basic building blocks are, the coarser the notion of complexity one gets. For example, if all algorithms are assigned the same constant cost, then multiplication of two numbers always has a complexity of 1, and this notion of complexity doesn’t tell us much.
A point I’ll come back to in a little while is the fact that any specific construction of a structure produces an upper bound on the complexity of that structure. The challenge, for any definition of complexity, is always finding a construction which minimizes the complexity measure and proving that it really is minimal.
Contexts for complexity
Implicit in the above informal definition is that all notions of complexity depend on choices, the most fundamental of which is a choice of context: we consider the complexity of a structure as a member of a class of similar or related structures. Crucially, that context is itself structured! To understand what structure a context might have, I’m going to take you on a detour through some more abstract maths. To keep the spotlight on complexity, I’ll examine the notions of complexity which arise in these contexts.
Complexity in topology
Topology studies various types of space. Often, a class of spaces can be generated by gluing together a class of “simple” spaces in a predetermined way. I’ll give some technical examples, but the details don’t matter: I just need you to pattern-match, since the complexity measures I’ll describe for these classes don’t rely on the details.
A simplicial complex is a space obtained by gluing together simplices, or the analogues of triangles in each dimension, along their faces/edges. A 0-simplex is a point, a 1-simplex is an edge or interval, a 2-simplex is a triangle (including its interior), a 3-simplex is a tetrahedron, “and so on...”
A CW-complex is a space obtained by gluing disks together along maps on their boundaries. Here, “disk” means the ball of points at a distance of at most 1 from the origin in n-dimensional space, and the boundary of an n-disk is an (n−1)-dimensional sphere. For example, the 1-disk is an interval, and its boundary is the pair of end-points. Gluing along a map on the boundary means that I identify the sphere on the boundary of a disk with its image along a function, as in this picture:
A (smooth) manifold is a space which can be constructed by gluing together patches that look like open subsets of Rn along subsets of a similar form. These spaces are the subject of differential topology (and differential geometry).
A scheme is a space constructed by gluing together algebraic varieties. These are the subject of algebraic geometry.
In all of these cases, we are given a class of “simple” spaces, which usually means a space whose structure we can easily study directly and explicitly. These are used to construct more interesting spaces, often with emergent global structure that the simple spaces lack.
There are numerous notions of equivalence for spaces. For manifolds, say, we could consider invertible maps which preserve all of the local Rn-like structure (diffeomorphisms), invertible continuous maps with continuous inverses (homeomorphisms) or, most coarsely, the maps with “quasi-inverses” where the respective composites can be “continuously deformed” back to the identity (homotopy equivalences).
A choice of recipe for constructing spaces, plus a choice of equivalence for the resulting spaces, results in a notion of complexity: the complexity of a given space is the smallest number of simple spaces required to build a space equivalent to the given one.
This isn’t the most nuanced form of complexity at our disposal in this setting. In the case of CW-complexes, for example, this combinatorial notion of complexity doesn’t take into account the potential complexity of the gluing maps (I’ll get onto defining complexity of maps later). One might also want the dimension of the constituent spaces to contribute to the calculation: it’s not outrageous to decide that a 8208-dimensional sphere is a little more complex than the familiar 1-sphere (circle).
Complexity in algebra
Abstract algebra studies various types of mathematical object. Each class of objects is determined by the operations that it comes equipped with (its signature) and the equations which these operations are required to satisfy. Here is a list of examples of what that can mean, with some formal details deliberately omitted for conciseness.
A semigroup is a collection of elements equipped with a multiplication operation, a way of combining any pair of elements to produce some other element.
A group is a semigroup in which we can also divide by/cancel elements (and there is a special element which is neutral in the sense that multiplying by it does nothing).
A ring is a collection of elements which we can add, subtract and multiply. A ring also has special elements called 0 and 1 which behave in the way you would expect with respect to addition and multiplication.
A vector space over a field[6]Kis a collection of elements (called vectors), equipped with both addition and scalar multiplication, where the latter is a bunch of operations indexed by the element of the field.
If you aren’t familiar with these, it’s not important; all I want you to observe is the pattern of their descriptions: elements and operations (plus some rules about how the operations interact with one another, which I have omitted).
In any of these examples, if I take a pair of elements (possibly the same element twice), I can apply a binary operation such as multiplication or addition to them to produce a further element. I can further perform operations involving the result to produce more elements, and so on. In other words, it may be the case that I can construct all of the elements of the algebraic structure by applying the operations successively to a handful of generators. For example, the positive natural numbers form a semigroup under addition, and every number can be obtained by adding 1 to itself some number of times, so 1 generates the semigroup of positive natural numbers.[7]
If I take an unstructured collection of elements, a set, I can generate a structure of any of the above types by iteratively applying the formal operations to the members of this set. This produces the free structure generated by this set. For example, the free semigroup on the set {a,b} has elements {a,b,aa,ab,ba,bb,aaa,…}, where I have written the letters next to one another to represent multiplication.[8]
If my general presentation of complexity from earlier made any sense, I’m hoping that you can already see that the number of generators provides a notion of complexity for algebraic structures.
Beyond free structures, a presentation of an algebraic structure consists of a collection of generators combined with a collection of pairs of elements of the free structure they generate which are to be identified (typically presented as equations). For example, the following is presentation of a semigroup: ⟨a,b∣aa=a,bb=b,ab=ba⟩ I claim that the semigroup it produces has just three elements, {a,b,ab}; work out what the multiplication operation does if you like. Again, if it’s not clear to you how one produces an algebraic object from a presentation, it doesn’t matter too much: the key observation is that the generators and relations in a presentation, supplemented with the generation procedure[9] for the given class of algebraic objects, provide all of the ingredients we need to view the size of a presentation as a measure of complexity of these objects. Just as in the topological case above, we must choose a notion of equivalence of structures in order to complete the picture, but the choice is much narrower for algebraic structures: we typically consider these to be equivalent if they are isomorphic, which is to say that there exists a structure-preserving bijection between their sets of elements.
Determining the structure generated from an arbitrary presentation is famously difficult, in a formal sense: it’s uncomputable in general, as are many of the measures of complexity which I’ll construct here. On the other hand, any finite structure has a “trivial” finite presentation, where the set of generators is the set of all of its elements, and the equations express the result of applying the operations to the elements. If the class of structures being considered has k binary operations then the trivial presentation of a structure with N elements has N generators and N2k relations. As a consequence, there must exist a most efficient presentation of any finite structure, and this situation is intuitively comparable to the notions of complexity we saw in previous sections. More generally, as long as we know that there exists a finite presentation of an algebraic structure (even if that structure is infinite), then there must be a smallest one, so this concept of complexity[10] makes sense for all finitely presentable algebraic structures.[11]
Between algebra and spaces
Combining the above two sections, we can consider all sorts of constructions extending our categories of algebraic and/or topological objects. This section is deep into abstraction, so I’ll keep it to a single example, but it foreshadows an approach to dealing with infinite structures which we’ll see again later on.
A profinite group is a topological group constructed as a projective limit of finite groups. Even though a given profinite group may not admit a finite construction (and may not even have a “minimal” construction in a meaningful sense) we can still extract a nuanced measure of complexity for these structures. Indeed, having determined a notion of complexity for finite groups such as the presentation length, we may define the complexity of a profinite group to be the maximum complexity of finite groups appearing in its construction, minimized across possible constructions. A particularly simple class of groups are the cyclic groups, with presentations of the form ⟨a∣ak=1⟩, where k is a positive integer and 1 represents the identity element of the group; with the above definition of complexity, it follows that the simplest (lowest complexity) class of nontrivial profinite groups are the procyclic groups, constructed as projective limits of cyclic groups.
Complexity within a structure
Why did I bother with the abstract detour above? The answer is that the structures encountered in complexity theory can often be thought of as functions defined on spaces or as elements of algebraic structures.
Piecewise maps
What does it mean for a space to be constructed by gluing together simpler spaces? One consequence is that when we define a map on the space, we can restrict that map to the simpler pieces, and conversely that if we are given mappings on the simpler pieces which are compatible with the gluing construction in a precise sense, then these mappings can themselves be glued together to define a mapping on the full space.
As such, as soon as we define a notion of complexity for functions on the simple spaces, we have numerous ways to assign complexities to the glued space: we could take a maximum or a weighted sum, say. I’ll explain some ways of assigning complexities to functions later on.
Elements of algebras
A choice of presentation of an algebraic structure A determines a natural notion of complexity for the elements of A, namely the shortest presentation of that element in terms of the generators in the presentation. Here the generators are playing the role of the basic building blocks, and the algebraic operations are the ones we allow when constructing the elements.
Observe that the trivial presentation corresponds to the trivial complexity measure I mentioned earlier, where everything has the same complexity. On the other hand, for a small generating set, the complexity becomes a lot more interesting.
It’s also worth noting that we can vary the complexity measure significantly by varying the weights assigned to the different generators and operations in the structure. In a real vector space with a given basis, for example, we could assign the basis vectors a weight of 1, scalar multiplication by r a weight of |r|2 and addition a weight of 0, in which case the “complexity” of vectors is the sum of the squares of the coefficients determining the vector in that basis, the “squared length” of the vector when the basis vectors are considered to have length 1. Contrast this with what happens if we assign scalar multiplication a weight of 0 and addition a weight of 1, in which case the complexity just counts the number of basis vectors required to present a given vector (equivalently, the minimum dimension of a subspace generated by basis vectors containing the given vector). Both of these are clearly highly dependent on the choice of basis.
Fragility of complexity measures
In our examples, we have seen that the complexity measure we end up with depends on all of the choices we made along the way, including the context, the basic building blocks, the allowed operations for combining them and the weights assigned to each of those ingredients. Changing any of these choices not only affects the numerical value we end up with, it can completely change which elements or objects are considered more or less complex than others. This is not a profoundly surprising result, although the degree of sensitivity may be surprising; see the end of the section on computational complexity below for an example.
Until recently, the fragility of complexity measures seemed to me to be an insurmountable problem. I don’t want to be using an evaluation that depends heavily on apparently arbitrary choices; those choices shouldn’t be influencing my results!
Fortunately, we can more often than not excise some arbitrary choices while salvaging a meaningful measure of complexity, through a variety of mechanisms. This is inherently a trade-off: the more we try to make our complexity classes independent of choices, the coarser-grained the notion of complexity becomes. For my own purposes, I’m hoping that I will nonetheless reach a complexity measure which captures the intuitions I’m seeking to formalize.
Minimal presentations
First of all, we can shift the choices up a level, where they can seem less arbitrary. If our context is a (finitely presentable) algebraic structure, and we have chosen a sensible notion of complexity for presentations, we can consider the generators of a minimal presentation as a natural choice of basic building blocks (and we already know that this will at least give a more refined notion of complexity than a non-minimal presentation).
If there is a unique minimal presentation for the context, the above is an entirely natural choice. However, there may instead be several (even many) minimal presentations for the algebraic structure: think of choosing different bases for a vector space. In that case, we can construct a more robust notion of complexity as one which is invariant under change of minimal presentation. This can be achieved, for example, by taking the minimal complexity value across possible choices of minimal presentation. The vector space example turns out to be a bad one on its own, since any vector is a member of some basis (so the complexity measures we saw earlier can never be basis-invariant), but if we instead consider linear endomorphisms of a vector space, choosing a basis for the space enables us to present the endomorphism as a matrix, and there are various non-trivial invariants including the rank which are basis-invariant, and can be considered as notions of complexity.
If we take a vector space equipped with an inner product, then while there are still just as many choices of basis available, some of those choices have nicer properties: the best-behaved ones are orthonormal bases. (As ever, if those terms are unfamiliar, read around them. The key take-away is that in some contexts the minimal presentation in an algebraic sense is not automatically the best possible.) We could modify the complexity measure on presentations to take the inner product structure into account, or just impose the constraint on choices anyway. Either way, having selected some extra criteria which the minimal presentations of the context must satisfy, we reduce the number of constraints on a complexity measure when we demand it be invariant under change of presentation. The “length squared” measure that I described in the last section is invariant under change of orthonormal basis, for example.
A compelling example of constrained bases in the definition of a complexity measure is that of tensor rank. One way of constructing a vector space is as a tensor product of a pair of vector spaces, say V⊗W; an element of such a space is typically called a tensor. The natural bases for V⊗W are those constructed from bases of V and W. Unlike for vector spaces more generally, not every tensor is a member of such a basis, and we therefore get a meaningful notion of tensor rank as the minimal number of basis vectors (across all possible constrained bases) required to present a given tensor.
Minimal decompositions
Much of the discussion above can similarly be applied to spaces: while there may be many decompositions of a space into simpler spaces, minimal decompositions provide natural basic components for complexity measures. Both here and above, if the minimum can be concretely identified, it will also necessarily be the simplest presentation of the space or algebra to work with from a computational point of view!
Consider the circle, or 1-sphere. When presenting this as a manifold, the minimal decomposition involves two open line segments (open subsets of R), glued along a pair of sub-segments.
Clearly there are several choices to be made here about the four segment lengths (the segments being glued together and the lengths of the overlaps); although the end result as a space is always a circle, the choice of decomposition will affect our definition of complexity of functions on the pieces, and we can only obtain an invariant complexity measure by taking all of the minimal decompositions into account, which amounts to examining symmetries of the resulting space. The study of symmetries moves us beyond topology into geometry. Finding complexity measures for functions which are invariant under these re-parametrizing symmetries is an interesting challenge, which is beyond the scope of this post.
Invariance at the cost of individuality
In some cases, there will be no useful choice-invariant notion of complexity for individual structures. We saw earlier that Kolmogorov complexity requires a choice of universal Turing machine. However, given any string, we can design a universal Turing machine which makes that particular string have arbitrarily small or large complexity, essentially by manipulating how the programs are encoded.
On the other hand, Kolmogorov proved that his measure of complexity is choice-invariant up to a constant, as follows.[12] If I take any pair of universal Turing machines U and U′, each can be compiled into the other, in the sense that there are strings u and u′ such that for all input strings x and y (independent of u,u′) we have U(u′;x)≡U′(x) and U′(u;y)≡U(y), where the semicolon denotes concatenation and the triple-line means “produces the same behaviour (in terms of halting and outputs)”. As such, for any string s, we have KU′(s)≤KU(s)+|u| and KU(s)≤KU′(s)+|u′|. An immediate consequence of this fact is that, while we cannot produce a measure of the Kolmogorov complexity of an individual string which is independent of the choice of Turing machine, if we take a sequence of strings, we can extract coarser-grained complexity values which are independent of the choice.
For instance, the asymptotic growth rate of the Kolmogorov complexity of a sequence of strings is a choice-invariant notion. Asymptotic here roughly means “on average in the limit of large indices”, the idea being that while we have little control over the exact lengths of programs required to generate individual strings on different Turing machines, the constant bounds on the difference between Turing machines of the complexity of strings in a sequence will merely be an “error” whose contribution can be made as small as we like by averaging across a sufficiently large number of strings in the sequence. An infinite stream S of digits also has a well-defined notion of complexity, by identifying S with the sequence (sn)n∈N whose nth term sn is the string consisting of the first n digits of the stream. An essentially random stream will produce a complexity that grows linearly, KU(sn)=O(n), whereas an infinite string of 0s has complexity which grows sub-logarithmically KU(sn)=o(log(n)); the invariance means that we can drop the subscript U in these statements.[13]
One must be careful with the kind of shift of context we performed above. We haven’t actually resolved the problem of having a well-defined complexity for individual strings: their natural substitutes as sequences, the constant sequences, all have the same complexity of “no growth”, KU(s)=O(1). Rather, we have merely observed that sequences of structures (such as strings) form a context which is rich enough to leave interesting complexity measures after imposing invariance with respect to symmetries which made complexity measures on the individual structures trivial.
Note that the asymptotic growth rate would not work as a measure of complexity of the profinite groups we saw earlier, even if we restrict to those which are expressible as the (projective) limit of a sequence of finite groups. This is because any subsequence or re-indexed sequence will produce the same group in the limit, and so we can make the complexity of the sequence grow as fast or as slow as we like. On the other hand, if we put a constraint on the form of the sequence, like imposing a relation between the index n and a bound on the size of the nth group in the sequence, we might be able to salvage an alternative to the “maximum complexity” measure we proposed earlier.
I should also stress that the asymptotic growth rate classes I have mentioned above are characterized by upper bounds, so that the class of O(f(n)) sequences includes also all of the classes of O(g(n)) sequences where g(n) grows more slowly than f(n). A more precise statement would be one of the form KU(sn)=Θ(f(n)), meaning that the growth rate of the Kolmogorov complexity of (sn)n∈N is bounded above and below by constant multiples of f(n), which excludes slower growth rates.
Probabilistic contexts
Rather than sequences, we might consider subsets of the family of finite strings. For sufficiently uniform subsets, one can make statements about the Kolmogorov complexity which are independent of the choice of Turing machine. For example, an argument based on the pigeon-hole principle shows that if we take the collection of strings of length n, even if we construct a universal Turing machine especially to compute these strings (so that all of the shortest possible progams produce strings of length up to n), since there are at most 2n−1 programmes of length less than n, at least one string in this set must have Kolmogorov complexity n (or higher) with respect to every universal Turing machine. By putting a uniform distribution on the strings of a certain length, we can express this as a probabilistic statement: for any universal Turing machine U, the probability that a randomly chosen string of length n has KU-complexity at least cis1+2−n−2c−n+1.
Relatedly, in place of individual (deterministic) streams of digits, I could consider streams of digits which are generated according to a random process, and then extract statistical statements about the asymptotic behaviour of such streams, such as the expected asymptotic growth rate. More generally, probabilistic variations on the context of structures under consideration can represent an alternative way to eliminate the dependence on choices, while retaining the same fundamental objects. The cost is that the resulting evaluations of complexity will be probabilistic or statistical in nature.
Values for complexity measures
Up until the last subsection, the only types of complexity value which I had presented to you were numerical, either whole numbers (counting a number of operations or pieces) or real numbers (measuring lengths or weighted sums). However, we just saw that we can attach values to complexity which are more subtle, such as growth rates attached to sequences of structures. What other kinds of values can complexity measures take?
Number vs Order
In other areas of maths, we see number systems extended in various ways. The name “complex numbers” might seem suggestive at this point as a potential target for valuing complexity. Unfortunately, complex numbers lack a key requirement for complexity measures, namely order. When defining the complexity of a structure, we want it to identify the minimum amongst possible values associated to that structure, and with complex numbers there is no natural ordering to minimize along. There are artificial ways to attach an ordering, like looking for a complex value with minimal modulus or considering only complex values with positive real and imaginary part and associating some ordering on these, but both strategies amount to lifting the ordering on the (positive) real numbers (resp. possible orderings on R+×R+), at which point we might as well use those, since the extra structure on the complex numbers is essentially being discarded.[14]
On the other hand, the ordinals, a generalization of numbers indexing (possibly infinite) discrete lists, have sufficient structure to act as complexity values, since by definition any set of ordinals has a least element. While this may seem rather abstract, ordinals are used to construct heirarchies of functions such as the ones we encountered for classifying growth rates in the above. I suspect that more intricate number systems, like a countable fragment of the surreal numbers, could be equally be used to construct hierarchies of this kind. For example, we can produce functions which tend to infinity arbitrarily slowly as well as arbitrarily fast, and these can be included as “infinitesimal factors” to growth rate functions defining complexity classes.
Total orders are convenient for defining complexity classes, but they are not necessary. As long as our collection of values has arbitrary meets (greatest lower bounds), that’s sufficient to define a minimum value, “the” complexity of the structure[15]. Incidentally, we have already seen some situations where our collection of complexity values is insufficient to express the complexity of some structures: the presentation complexity of an algebraic structure with no finite presentation is not defined. An alternative to delineating different sizes or qualities of infinity is to just give up and say that any structure which doesn’t admit a value in our chosen collection has infinite complexity (the greatest lower bound of the empty collection, or top element of the order). If our complexity measure is supposed to express the difficulty of a problem or construction, this kind of collapse makes a lot of sense: there’s little immediate benefit to delineating between different degrees of “impossible”!
For the remainder of this post, I’ll talk about a valuation structure for a choice of values for complexity measures. It can be useful for a valuation structure to have further algebraic structure, as the positive reals and the natural numbers do. A formal reason for this is the subject of the next subsection.
Symbolic complexity values
Warning: this is the most abstract subsection of the post, proceed with caution.
Once we have determined the generating operations and the building blocks for our measure of complexity, there remain choices to make about what values to assign to each of them, especially which features should have non-zero cost, and how these costs are combined in computing a complexity bound from a construction. Putting aside the last subsection for the time being and considering just the simplest case where the complexity values are taken to be positive real numbers, the precise values needed in practice may depend on implementation details. In the most concrete examples, these values might be determined by the relative speeds of the operations for a given (fixed) implementation on my hard drive.
Rather than making the judgement about what these values should be straight away, we could interpret each basic building block as a (formal) variable and interpret each construction operation as a (formal) function, so as to view the construction itself as an algebraic expression for the complexity bound it determines for the structure it defines. A complexity measure then amounts to an assignment of elements of our valuation structure to each variable and an interpretation of the formal functions as operations on the elements of the valuation structure. The complexity of a structure is the minimum (in the ordering on the valuation structure) of the resulting value over its possible constructions.
A reader with experience in logic might prefer to think of it like this: a construction of a structure, such a presentation, is a purely syntactic expression in a particular language which comes equipped with predetermined semantics in the domain of interest, such as the class of algebraic structures. A complexity measure is a second choice of semantics for the same language, this time in the valuation structure; I might as well call the second semantics an evaluation. The complexity of a structure is the minimum evaluation of constructions of equivalent structures. (Beware that this syntax-semantics presentation might cause some confusion when we get onto implicit complexity theory later, since the constructions there are proofs but the structure generated by a proof is just its conclusion, which is also a syntactic object!)
This abstraction reveals the higher-order choices involved in defining complexity measures, especially the ones determining how we combine the costs associated to the building blocks when they are put together with an operation.[16]
It is not clear to me whether this formulation of complexity is exhaustive. A priori there may be other contributing factors to complexity that a given evaluation fails to capture, but any examples in this vein that I can come up with seem to be accounted for by including extra operations in the constructions. In my original example of multiplication of large numbers in terms of multiplications and additions of individual digits by composition, for example, we could also include the cost of shifting digits around in these algorithms to account for how the time taken to multiply larger numbers in practice will grow faster than the number of single digit operations involved (for practical implementations).
Combinatorial complexity measures and projective parameter spaces
I’ll call a complexity measure combinatorial if the basic building blocks are assigned constant (positive real) values and all of the combinations operations are additive on the complexity (or possibly add constants; I’ll leave that possibility implicit here). Our measures which counted the number of basic digit addition and multiplication operations, the string length and the presentation complexity are all combinatorial. This is arguably the simplest kind of complexity measure, if such a notion makes sense,[16] and is amenable to some specialised analysis techniques, which I’ll sketch here.
Observe that if we scale all of the values assigned to the basic building blocks in a combinatorial complexity measure by some positive real number, the complexity value for all constructions are scaled by the same number. Since complexity theory is mostly concerned with comparative evaluations of which presentations are more complex, it may therefore be useful to consider the bounds on complexity determined by given constructions in terms of projective geometry.
Consider the basic operations of addition and multiplication of digits we considered earlier. We can define a projective space[17] of parameter values for these operations, whose elements are the possible ratios of costs of performing them; the dimension of this space will be 1 (one less than the number of basic building blocks for the combinatorial complexity measure). Given a list of algorithms for multiplication of numbers of a given size in terms of these basic operations, we get a mapping from this projective space to another projective space, whose dimension is one less than the number of algorithms. The mapping sends each ratio of parameter values to the corresponding ratios of complexity bounds determined by the algorithms. This mapping characterizes, amongst other things, the break-even points for which of the different algorithms under consideration give the best upper bound on the complexity, as a function of the cost ratio of the basic operations.
The same basic argument can be extended to any combinatorial complexity measure. I don’t have the experience with projective geometry to take this analysis further than the above curious observation at the moment (I have not come across any literature on this direction so far). I might as well mention that I expect combinatorial complexity measures to be entirely adequate for my purposes in goal-directedness, so this section could get developed further by me in future.
The broad takeaway of this section is that one more way of avoiding arbitrary choices, this time in the defining values of a complexity measure, is to consider all possible choices together at once, or at least to consider classes of choices. A specific choice can either be made later (if a specific implementation will determine the values assigned) or coarser-grained conclusions can be drawn which are a function of the possible choices. In the latter case, the conclusions might be judgements about upper bounds on the complexity, as a function of the complexity parameters.
Computational Complexity
I have discussed the example of Kolmogorov complexity at some length. Let’s see some domains of computational complexity theory, for balance (although there are surely plenty I haven’t read about).
Classical computational complexity
Computational complexity theory asks, “how much of a given resource (time, space/memory, etc) is required to compute a value/solution?” Just like Kolmogorov complexity, the answer to such questions depends on the details of the model of computation being considered. Even though “computation” is in the name, the mathematically interesting details of computational complexity are considered to be those which are independent of specific implementation. For this reason, it is typical to consider an implementation-independent version, which just as above is most easily achieved by instead considering sequences of problems with a sensible indexing parameter. A large number of different models of computation can simulate one another with an encoding of inputs of at worst polynomial (even quadratic!) size. The complexity classes we end up with are measures of asymptotic growth rates, in the sense I described earlier, but these classes are coarser than those above, being defined up to a polynomial change of indexing of the sequences.
This coarse-graining also resolves another issue, related to the implementation of structures as inputs to models of computations. For example, if the input is a simple, undirected graph, should the input size measure the number of vertices or the number of edges? The latter grows approximately quadratically in the latter (a simple graph has at most (n2−n)/2 edges). There are other possible choices too, such as the sum of the number of vertices and edges; all in all, selecting the indexing amounts to choosing a complexity measure for graphs. Since the complexity classes are indifferent to the exact complexity measure employed for the structures, on the condition that these are related to one another by polynomial bounds, these choices can be safely ignored. as is the case for my graph example.
The question then becomes, “how quickly does the quantity of a given resource required to compute a value/solution grow with the size of the input?”
Typical examples of complexity classes are:
Deterministic time complexity classes, such as P (polynomial time), DLOGTIME (deterministic logarithmic time with random access[18]) and EXPTIME (exponential time). These measure how quickly decision problems in a sequence increase in running time, or number of operations performed, with the size of their inputs.
Non-deterministic time complexity classes, such as NP (non-deterministic polynomial time) and coNP, which have several equivalent definitions. One of these is that they measure the growth rate of time taken to check a proposed solution to a problem is valid (resp. invalid).
Deterministic space complexity classes, such as L (logarithmic space), PSPACE and EXPSPACE. These measure the growth rate of the amount of memory/working space required for a decision problem in terms of the size of the inputs.
Nondeterministic space complexity classes like NL, NPSPACE and NEXPSPACE. Savitch’s theorem shows that the latter two of these are equal to their deterministic counterparts.
Interpolating classes such as SC (Steve’s Class) which place simultaneous demands on time and space complexity of an algorithm.
All of these are defined for (variations on) Turing machines, but many of them have closure properties which allow them to be characterized in other ways.
Instead of decision problems, we could consider computable functions, and determine the complexity classes for such functions by the same criteria as above. The analogue of P for functions is sometimes called FP. The class FP is closed under composition, and was characterized by Cobham as the smallest class of functions with certain closure properties containing the functions “append 0”, s0(w)=w0; “append 1″, s1(w)=w1, and “log-exponential”, f(x,y):=xl(y) (where l(y) is the length of y in bits). In other words, the resulting class of functions has an implementation-independent characterization! Obtaining and manipulating intrinsic characterizations such as this is the subject of implicit complexity theory.
Many containments are known between the various classes. A great number of the open problems in complexity theory ask whether such containments are strict. Most famously, it’s easy to show that any problem which can be decided in polynomial time can be checked in polynomial time, so P⊆NP, but it’s not known whether some problems checkable in polynomial time are not decidable in polynomial time; this is the famous P vs NP problem.
Why should the answer be difficult to determine? An oracle is a hypothetical component which can be added to a Turing machine which responds to certain queries (even uncomputable ones) efficiently; in the language of the present post, this amounts to including those queries or functions as basic operations. Baker, Gill and Solovay proved in the 1970s that there exist oracles which make the relativized P vs NP problem true or false. Computational complexity is very sensitive indeed to which operations are allowed and their respective costs. Several proof methods in computational complexity theory up to this point could be relativised, meaning that adding an oracle made no difference to their arguments. As such, the Baker-Gill-Solovay result demonstrates that such methods cannot distinguish P from NP; this is the relativization barrier.
Circuit complexity
A more specialized branch of computational complexity deal with circuits, which are a more rigid notion of algorithm closer to the algebraic contexts I presented above, where the operations are logic gates. The complexity is measured in terms of various measures of the size of the circuits, including width and depth, and the classes are defined in terms of these and a number of other factors, such as the number of inputs that each gate is allowed to accept and the order in which logical operations are allowed to appear. I don’t know much about these yet, so I’ll just mention the AC (alternating class) hierarchy and NC (Nick’s class) as interesting examples.
Descriptive complexity
By definition, a decision problem is the problem of determining whether a certain proposition (logical formula) is true for a structure. For a given class of finite structures, encoded in such a way as to allow their being input into a Turing machine, say, we can identify a proposition interpretable on these structures with its extension, which is to say the collection of structures which validate the proposition.
Since propositions are formal statements with lots of structure, we have various notions of complexity for them, including length. The type of complexity that is considered in descriptive complexity theory is that of the smallest fragment of logic (the smallest family of logical operators or constructors) within which the proposition can be expressed. This is interesting in its own right in the context of this post, since it’s a collection of complexity classes which is not a coarse-graining of a numerical complexity measure in any obvious sense. Some examples of fragments of logic include:
First order logic (FOL) includes the usual negation (NOT), (finitary) conjunction (AND) and disjunction (OR) operations applied to the basic relations on the structures, as well as existential (∃) and universal (∀) quantification applied to variables ranging over the elements of the structures.
FOL with a transitive closure augments the above with the ability to take the transitive closure of a relation.
Second-order logic (SOL) allows for quantification over the family of relations defined on the structures.
Conversely, each fragment of logic defines a class of decision problems. Surprisingly, many of these classes coincide with complexity classes listed above, which is another example of classes having characterizations independent of their relation to Turing machines or circuits.
I could observe that the complexity values in descriptive complexity theory aren’t closed under intersections in any obvious way. We have equivalence between propositions in different fragments of logic if they have the same extension which means that a language can be in multiple classes, but a priori the existence of equivalent propositions in two incomparable fragments doesn’t imply the existence of an equivalent proposition in a smaller mutual fragment. The complexity of a family of structures is thus only formally well-defined if we can prove that this is the case, or if we allow “intersection classes” expressing the fact that a formula lies in multiple incomparable fragments of logic.
Descriptive complexity theorists are not concerned with this problem, to my knowledge, because the class of fragments of logic is entirely open: we can always extend our logics with new connectives and constructions, so there is little hope of being exhaustive about which precise complexity class a class of structures lies in unless we impose some (potentially arbitrary) extra constraints.
Complexity measures for functions
I can now go about constructing some of the complexity measures I’ve been promising. The starting point is analysis, yet another branch of maths..!
Approximation complexity
Suppose I want to assign a measure of complexity to a function. If the function is computable, we fall into the realm of computational complexity. But a function may be formally uncomputable in several different ways.
One obstacle is that the function may take values which are not accessible in our chosen model of computation. A machine which can only output natural numbers will struggle to express negative numbers; a machine which can express integers may not explicitly be able to output rational numbers, and a machine which can represent rational numbers may be unable to represent irrational algebraic numbers or arbitrary real numbers. Another obstruction that could arise is that the calculation must check an infinite number of cases to arrive at a conclusion, and so on some inputs never produces an answer.
Although calculating an exact value is hopeless in such situations, in the first case we can at least gauge the complexity of approximations to these functions.
For the cases where the machine outputs integers, we can compute the complexity of the floor or ceiling of a given function, as a function of the input. If the inputs of the functions are also natural numbers, this reduces to ordinary computational complexity, but more generally we may have to choose a sequence of inputs from which to obtain a complexity class as the asymptotic worst-case upper bound on the complexity of the approximation.
The rational-to-real case is more interesting. Here, we can examine the resources required to achieve an approximation of a certain quality, which is to say produce an output within ϵ of the true value. In this situation we can fix the input value and consider the resources required to achieve ever better approximations of the function value; a natural choice of sequence here to measure the complexity along is to take ϵ=1/n with n=1,2,3,… The asymptotic time complexity in this situation is sometimes called convergence rate oriteration complexity. As we saw earlier with Kolmogorov complexity, we must fix a machine against which to measure the asymptotic complexity, since otherwise we could choose a machine at each stage which just outputs a pre-computed sufficiently good approximation; for reasonable complexity measures, the result should depend only on the function, rather than the choice of machine.
There is another implicit parameter choice being made here. I am using the absolute difference to judge the quality of the approximations, but other metrics are possible: for functions to the rationals, we could measure the p-adic distance to the true value for some p. This choice of metric will be problem-dependent.
Simple functions
Rather than approximating individual values of the function (and possibly have to make a choice about which machine to use in the calculation and which input values to assess the complexity at), we can ask how difficult a given function is to approximate by a class of simple functions. This has a specific meaning in the theory of Lebesgue integration, namely a function obtained as a finite sum of (real multiples of) indicator functions for measurable subsets. That coincidence is entirely intentional, since the first example I want you to picture is that of approximating a function from (a bounded interval in) the reals to the reals by piecewise constant functions. In this situation, I can ask how many constant pieces I need to use to obtain an approximation to the function within ϵ=1/n of its true value, and hence measure the complexity of the function of the growth rate of that number as n increases. You could call this integration complexity, since the complexity determines how quickly the function’s Lebesgue integral converges.
Even for this notion of simple function, there are choices to make. Since measurable subsets are generated from open (or closed) intervals, we could insist that the simple functions be constructed from indicator functions of intervals, or we could allow arbitrary measurable subsets. Another consideration is whether we require that the function is approximated at all input values or merely “almost everywhere”, the latter being a notion more compatible with the tools of measure theory; for continuous functions there is no difference.
As another consideration, we might wonder if the integration complexity of a function is invariant under affine transformations (that is, under translations and rescaling), following the geometric intuition that congruent shapes in the plane should behave in similar ways. The complexity is already invariant at the level of individual approximations under horizontal translation and scaling, since we can translate and rescale an optimal sum of simple functions to obtain an optimal sum with the same number of terms for a function so transformed. For vertical translation, we at worst need to add a single simple function (a multiple of the indicator function for the domain of definition, assuming it is sufficiently nice), and since growth rates are insensitive to constant changes, this makes no difference. Vertical scaling is more subtle: a simple function s with |s(x)−f(x)|<1/n rescales to a worse approximation when the scale factor is larger than 1: |ks(x)−kf(x)|=|k||s(x)−f(x)|<|k|/n. However, growth rates are coarse enough to absorb this kind of discrepancy as a contribution to the constant bounding factor, with the result that integration complexity is invariant under affine transformations! The cost of this invariance is that all simple functions have a complexity of O(1), although this too justifies the choice of name, since simple functions are those whose Lebesgue integral can be computed exactly with a finite sum.
Power series, Fourier series
Depending on the nature of the problem, it may be more interesting to use other classes of functions to approximate the function and thereby measure its complexity. Power series, including Taylor series, are a very well-studied example of this. Given a function expressed as a power series, I can measure its complexity in terms of the growth rate of the coefficients, where now the types of functions which I encounter as growth rates will typically be sub-constant. Alternatively, I could transform these coefficients into a measure of the rate of convergence, defining the complexity of a series ∑∞j=0ajxj defined over some domain D around 0, to be the asymptotic growth rate of the function n↦min{N∣∑∞j=Najxj<1/n in D}.
As ever, there are alternative choices available here. I could measure the growth rate of the absolute convergence (replacing the summands with their absolute values in the last definition), for example. A natural feature of these complexity measures is that they are invariant under changing a finite number of coefficients, or equivalently under adding a polynomial to the power series. We also have invariance under affine transformations, comparably to the last subsection, as long as we also translate and scale the domain appropriately.
Of course, power series are not the only way to decompose a function. We can also look at Fourier series, decomposing a function into sine and/or cosine functions, and construct similar definitions of complexity from the coefficients of such an expansion. In this case, the complexity being measured will be geometric in flavour: a function which is low in “Fourier complexity” is well-described by its low-frequency components, whereas higher complexity means that the high-frequency terms play an increasingly important role in the function’s behaviour. Any finite sum of (discrete frequency) sine waves will have constant complexity. There is also a relationship between the convergence rate of the Fourier decomposition of a function over an interval and the differentiability of the function.
It is interesting to contrast the notions of complexity arising from the two decompositions above. They are not directly comparable: a straight line has minimal power-series complexity but non-trivial Fourier complexity, and dually a simple sine wave has interesting power-series complexity but trivial Fourier complexity. This illustrates once again how complexity measures depend on our choices of basic building blocks or operations. There are, of course, yet other decompositions of functions that one might consider!
Functions on spaces
I have talked extensively about functions on the reals. Much of the above can be extended directly to functions on the complex numbers. Only a little extra work is needed to define complexity measures for (real- or complex-valued) functions on higher dimensional spaces, more specifically on domains in Rn, since we can define higher-dimensional simple functions, polynomials and harmonic waves easily, typically as products of the lower-dimensional ones. Some decisions need to be made, of course: for simple functions, are only rectangles allowed, or are indicator functions for more general measurable subsets acceptable? I won’t examine the consequences of such choices here.
With these choices established, we can now construct complexity measures for functions defined on more exotic spaces. For example, if we are given a manifold M which has a finite decomposition as a gluing of subspaces of Rn, a function M→R decomposes into functions defined on those subspaces, whose complexity we know how to measure! In keeping with the methods discussed earlier in the post, we therefore define the complexity of such a function as the minimal sum across minimal decompositions of M of the complexity of the restricted functions. We can do something similar for CW-complexes, where the rigid simplicity of the components involved (disks) may afford a finer-grained notion of complexity.
It’s surely possible to go further and define the complexity of functions between manifolds in terms of decompositions, but the higher you go in the definitions, the more work you have to do to ensure that your lower-level measures of complexity are invariant enough to combine together nicely into a well-defined (or interesting) higher-level measure of complexity.
Conclusions
My hope is that this post opens the door to complexity for others. Here on LessWrong especially, people often make reference to some established notions of complexity including Kolmogorov complexity, but it sometimes seems to me that a more tailored measure of complexity would better reflect the intuition they are trying to formalize.
A more ambitious hope of mine is that people will take some of these ideas and run with them. The most exciting possibility as I see it is a formal exploration of the relationships between complexity measures in different settings and their relationships with other invariants. Such work might form the foundation for new approaches to attack the hard open problems in computational complexity theory.
Coming soon...
For the purposes of my goal-directedness project, the next step is to use the ideas I’ve explored here in order to build a measure of complexity for goal-directedness. This will require me finally making some choices which I’ve been putting off for the whole project, but it will also get me pretty close to some concrete answers.
I would also like to examine relative complexity, a topic which will be directly relevant to measuring the complexity of explanations, and which I think will be clarified by the view I’ve taken on complexity in this post.
Further References
The primary inspiration for this post was J. M. Landsberg’s book Geometry and Complexity Theory, which was gifted to me by my postdoc advisor Thomas Seiller (thanks Thomas!). The book is about algebraic complexity theory, a domain which applies algebraic geometry to computing (or at least bounding) the complexity of algebraic problems. While it doesn’t discuss complexity theory at the level of generality that I have attempted to here, it was enough of a departure from the computational complexity notions that I had encountered previously to give me a new perspective on what complexity can (or should) mean.
For computational complexity, I used some fragments of Sanjeev Arora and Boaz Barak’s book Computational Complexity: A Modern Approach, although I could benefit from a deeper reading of that.
The word “structure” here is intentionally quite generic. The structure can be a mathematical object (a space, an algebra, a function...), or a computational object (an algorithm, a solution to a problem...).
See the right-most column in the Multiplication entry of the table; long multiplication takes O(n2) operations, meaning that the number of operations is approximately proportional to the square of the number of digits in the numbers being multiplied (assuming here that the numbers are of the same order of magnitude), but we can do much better: the first improvement was discovered by Karatsuba and reduces the number of basic multiplications needed to multiply 1000-digit numbers by a factor of 17.
A similar story is the surprising existence of more efficient algorithms for matrix multiplication, which is one focus of the domain of algebraic complexity theory.
A Turing machine is a pretty famous abstract model of computation consisting of some tapes, which are unbounded collections of discrete cells which are either empty or contain a symbols (usually 0 or 1), a device called the head that can read from and write to the tapes, and a program instructing the head how to interpret the symbols it encounters. If this isn’t familiar to you, almost any programming language will do for the purposes of understanding Kolmogorov complexity.[19] I can be informal about this, and about the obviously underspecified parameters in the description of the Turing machine, such as the number of tapes and their shape (does it go off to infinity in both directions or just one?), the language/symbols which the machine can read and write, and so on, thanks to the result I quote later on.
Amusingly, finding the most efficient algorithm to produce a given output in various programming languages is a pastime with a significant community around it.
A field is a ring where we can also divide by elements other than 0. I didn’t include it in the list of algebraic structures, because the “other than zero” part turns out to prevent the existence of free fields!
The more famous version of this example would be the inductive definition of the natural numbers in Peano arithmetic: any natural number can be expressed by applying the successor (“add 1”) function to 0 the corresponding number of times. However, since the variety of algebras with just a single unary operation isn’t as widely studied to my knowledge.
Formally speaking, (aa)a and a(aa) could be considered distinct elements were it not for one of the axioms of semigroups which I didn’t mention, namely associativity. I mention this here to highlight how many of the formal details are missing from this account. I’m trying to paint you an intuitive picture, but if you’re curious about what’s under the hood, I would be more than happy to discuss it in the comments!
Note that the generation procedure here is very sensitive to the type of structure we are considering: a ring generated from the presentation ⟨a,b∣aa=a,bb=b,ab=ba⟩ would be infinite, since we have imposed no restrictions on sums of elements.
I don’t know whether this notion is typically studied under the banner of complexity theory. As I mentioned in the introduction, I’m deliberately seeking an inclusive definition of complexity here.
Of course, it can also be extended to algebras which only admit infinite presentations, but infinite cardinals are a lot coarser than finite ones, and one has to start worrying about set-theoretic foundations, whereas the finite notions are a lot easier to treat constructively.
I implicitly assume in this argument that I’m working with Turing machines taking inputs in binary, which is what’s needed to make the “up-to-constant’ claim hold; if the Turing machines accept different alphabets, then I need to encode each into the other for the input to make sense, and this can modify the length of programs by a scale factor. For the conclusions regarding asymptotic growth rates that I mention, this subtlety is unimportant.
I’ve used here, Big Oh and Little Oh notation here. This is a compact way to express asymptotic bounds on functions; I also use Big Theta notation later on.
The point of this paragraph is to disillusion the reader about the existence of any strong existing connection between “complexity” and “complex numbers”. Just as elsewhere in maths and science, it’s entirely possible that a clever application of complex numbers[17] could solve some problems in complexity theory, but I maintain that complex numbers do not constitute good candidates for complexity values.
Up until this point, it has been the case that the complexity value directly measures a most efficient construction, presentation or decomposition of an object. In allowing more general structures, there arises the possibility of situations in which there is no minimizer: we could have a sequence of increasingly efficient constructions with no minimum, or a collection of minimal constructions which are incomparable in the ordering. This is not intrinsically problematic, but may require some care in general arguments about complexity.
Amusingly, the syntax-semantics presentation of complexity measures is sufficiently precise for us to begin asking questions about the complexity of complexity measures… I’ll leave to some other brave soul the pleasure of untangling that ouroboros.
Although I have dismissed complex-valued complexity measures, when studying maps between projective spaces, complex values can help achieve a more complete global picture. This train of thought leads down a rabbit hole which I shall not explore here, but there is precedent for taking this idea seriously: recall that complex values were useful to electrical engineers in uniting resistance and capacitance, for example.
For the logarithmic time class to contain any interesting problems, we need to impose the assumption that moving along the input (such as moving any distance along the input tape of a Turing machine before reading) takes constant time, since otherwise the time taken to move is already linear in the input length. For example, assuming the input n>0 is provided in binary with a leading 1, the function n↦⌊log2(log2(n)+1)⌋ is a member of DLOGTIME since we can determine its value by initialising the output at 0, checking for the presence of a digit at positions 2k of the input for increasing k≥1, adding one if a digit is found and halting otherwise.
Arora and Barak list Turing machines, lambda calculus, cellular automata, pointer machines, bouncing billiards balls and Conway’s Game of life as abstract models of complexity which can simulate one another and hence are Turing complete.
Goal-directedness: tackling complexity
This is the third post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me.
My strategy for achieving a formalisation of goal-directed behaviour is to equate it with “behaviour which is well-explained in terms of goals”. So far, I have explored criteria for judging explanations in general and a structured way to decompose explanations of agent behaviour specifically.
One of those criteria was simplicity, or its dual, complexity. In order to determine whether my initial proposal for characterizing goal-directedness holds water, I need to get my hands dirty and learn some complexity theory. This post is a record of me doing that, while keeping my objective in sight. I have learned a lot of complexity theory/theories since I wrote my original post, so the present post is a major update from the somewhat naïve intuition I expressed there.
See the end for a few references not directly linked in the main text.
Setting the scene
In my initial breakdown of explanations, I explicitly assumed that an individual explanation could be assigned a value representing its complexity. Deeper investigation revealed at least four points where complexity might be introduced into an explanation. At the time, I wondered what measure of complexity I should use, expecting to be able to pick one (or several) up out of the box, and get down to the business of applying already-developed theory.
In attempting to apply the various complexity measures to my explanations as I was learning about them, I found each of them to be inadequate in one way or other. For example, Kolmogorov complexity and various branches of computational complexity explicitly involve a model of computation, such as a (choice of) Turing machine, but the explanations involved in goal-directedness have no obvious ties to computation, and any choice seems arbitrary. Moreover, a lot of the problems discussed in the various branches of complexity theory seem to be about sequences of problems or structures, whereas I am trying to attach complexity to individual explanations!
While I may end up adapting some existing theory, my advisor Adam Shimi pointed out that rather than trying to jumble existing measures together in an ad hoc way, I should try to understand the underlying principles of complexity to the point of building a measure which captures what I actually care about. That’s what I’m really aiming to do in this post.
Complexity, from the bottom up
Complexity is a word which carries multiple competing, often conflicting, intuitions and formalizations. It has a lot of baggage. The aim of this post is to extract a core technical notion of complexity which is inclusive enough to cover everything contained in the various branches of complexity theory (plus some other domains besides), in an intuitive enough way to allow me to design a notion of complexity that captures what I need for evaluating explanations. In a single sentence, the concept of complexity I’ll describe is a quantity assigned to a structure[1] that measures how many simple pieces are needed to construct it.
A word of caution to begin with: if you have a casual acquaintance with computational complexity theory, you’ll be aware of complexity classes, whose members are parameterized families of problems. I will get around to those later, but the crucial first step, which was previously missing in my understanding of complexity, is an understanding of the complexity of (solutions to) individual problems.
I should also stress that I’m examining complexity in a much broader context than is covered by any single discipline of complexity theory, but I’ll discuss some specific branches of the domain later on.
Motivating examples
Since I expect computer science to be more accessible to my audience than abstract maths, I’ll start with two motivating examples in computational terms.
Why is multiplying big numbers more difficult than multiplying small numbers? When we actually sit down and compute a product of large numbers, our go-to method is long multiplication, where we perform a series of pair-wise multiplications of individual digits (using values that are engraved into our memories by early school experiences which were traumatic for some...) and then add up the results. The more digits there are, the more digit multiplications (and additions) we must perform, and the longer the calculation takes.
It may surprise you to discover that there are algorithms for multiplication which are more efficient than long multiplication for sufficiently large numbers.[2] We don’t have to just multiply the digits of the numbers themselves; we can sometimes add and/or subtract digits before multiplying to reduce the number of basic multiplications we need to perform.
Finding better algorithms is difficult enough that the existence of a better algorithm than long multiplication was a big surprise even to the likes of Kolmogorov. One thing is clear, however: there is a smallest number of single digit multiplications and additions required to multiply an arbitrary n-digit number and an m-digit number; there is a most efficient algorithm. The size of that algorithm, in terms of the number of operations involved, is the computational complexity of the multiplication operation for numbers of these sizes.
If we found that adding digits is easier/less costly than multiplying them, then we might adjust our measure of complexity to reflect this, and this may even change which algorithm is considered the most efficient. More on that later.
Now let’s take an example of a different flavour. What is the complexity of a piece of text? For simplicity, let’s restrict ourselves to a string s consisting of just the digits 0 and 1. Here are two possible examples:
0101010101010101010101010101010101010101010101
00101011110011010001010000111010001
Before we get started, I want to stress that there is not just one answer here. For example, I could consider the basic operations for writing strings to be “write 0”[3] and “write 1″. In that context, there is no more efficient way to produce these strings than to write them out one digit at a time, and the measure of complexity that we end up with is simply string length, for which the former is the more “complex” of the two. On the other hand, if I have the comparative might of a universal Turing machine U at my disposal[4], I could consider the permissible string-generating procedures to be the inputs for that Turing machine (that is, the programs), and take the complexity to be the length of the program instead of the length of the string. The result is Kolmogorov complexity, KU(s). The value computed depends on the specific choice of U, but for many choices it’s a lot easier to program “write 01 twenty three times” than to find a short program that produces the second string above. I could also choose any programming language as a substitute for a Turing machine[5], which might make that last claim more credible.
So what is complexity?
I can extract from these examples a general idea of what complexity should be.
I begin by selecting a class of structures whose complexity I want to measure. In the examples above, these were “functions defined on pairs of numbers of a given size” and “strings (of 0s and 1s)”, respectively.
I select a generation procedure. In the first example, this was the class of algorithms which I was allowed to use (those involving just calculations in terms of individual digits). In the second, the original generating procedure was just “writing down digits”, while the second was “input into a Turing machine and consider the output”.
I select a handful of basic operations or building blocks, considering these to require a constant small cost to perform. In the first example, these were additions and multiplications of single digits. In the second, we had individual digits (or ‘the operation of writing a single digit’) in the first instance and inputting single digits into a Turing machine in the second instance. Note that for the time being, the basic operations are assumed to be sufficiently varied to be capable of exactly generating the desired output.
Given these ingredients, the corresponding complexity of a given structure is the minimum total cost of all choices of basic operations/components which generate it.
This turns out to cover all of the usual notions of complexity, of which we’ll see many in a later section. As we’ve already seen, it also includes some simpler measures, such as size. As a general rule of thumb, I’ve found that the more powerful the basic building blocks are, the coarser the notion of complexity one gets. For example, if all algorithms are assigned the same constant cost, then multiplication of two numbers always has a complexity of 1, and this notion of complexity doesn’t tell us much.
A point I’ll come back to in a little while is the fact that any specific construction of a structure produces an upper bound on the complexity of that structure. The challenge, for any definition of complexity, is always finding a construction which minimizes the complexity measure and proving that it really is minimal.
Contexts for complexity
Implicit in the above informal definition is that all notions of complexity depend on choices, the most fundamental of which is a choice of context: we consider the complexity of a structure as a member of a class of similar or related structures. Crucially, that context is itself structured! To understand what structure a context might have, I’m going to take you on a detour through some more abstract maths. To keep the spotlight on complexity, I’ll examine the notions of complexity which arise in these contexts.
Complexity in topology
Topology studies various types of space. Often, a class of spaces can be generated by gluing together a class of “simple” spaces in a predetermined way. I’ll give some technical examples, but the details don’t matter: I just need you to pattern-match, since the complexity measures I’ll describe for these classes don’t rely on the details.
A simplicial complex is a space obtained by gluing together simplices, or the analogues of triangles in each dimension, along their faces/edges. A 0-simplex is a point, a 1-simplex is an edge or interval, a 2-simplex is a triangle (including its interior), a 3-simplex is a tetrahedron, “and so on...”
A CW-complex is a space obtained by gluing disks together along maps on their boundaries. Here, “disk” means the ball of points at a distance of at most 1 from the origin in n-dimensional space, and the boundary of an n-disk is an (n−1)-dimensional sphere. For example, the 1-disk is an interval, and its boundary is the pair of end-points. Gluing along a map on the boundary means that I identify the sphere on the boundary of a disk with its image along a function, as in this picture:
A (smooth) manifold is a space which can be constructed by gluing together patches that look like open subsets of Rn along subsets of a similar form. These spaces are the subject of differential topology (and differential geometry).
A scheme is a space constructed by gluing together algebraic varieties. These are the subject of algebraic geometry.
In all of these cases, we are given a class of “simple” spaces, which usually means a space whose structure we can easily study directly and explicitly. These are used to construct more interesting spaces, often with emergent global structure that the simple spaces lack.
There are numerous notions of equivalence for spaces. For manifolds, say, we could consider invertible maps which preserve all of the local Rn-like structure (diffeomorphisms), invertible continuous maps with continuous inverses (homeomorphisms) or, most coarsely, the maps with “quasi-inverses” where the respective composites can be “continuously deformed” back to the identity (homotopy equivalences).
A choice of recipe for constructing spaces, plus a choice of equivalence for the resulting spaces, results in a notion of complexity: the complexity of a given space is the smallest number of simple spaces required to build a space equivalent to the given one.
This isn’t the most nuanced form of complexity at our disposal in this setting. In the case of CW-complexes, for example, this combinatorial notion of complexity doesn’t take into account the potential complexity of the gluing maps (I’ll get onto defining complexity of maps later). One might also want the dimension of the constituent spaces to contribute to the calculation: it’s not outrageous to decide that a 8208-dimensional sphere is a little more complex than the familiar 1-sphere (circle).
Complexity in algebra
Abstract algebra studies various types of mathematical object. Each class of objects is determined by the operations that it comes equipped with (its signature) and the equations which these operations are required to satisfy. Here is a list of examples of what that can mean, with some formal details deliberately omitted for conciseness.
A semigroup is a collection of elements equipped with a multiplication operation, a way of combining any pair of elements to produce some other element.
A group is a semigroup in which we can also divide by/cancel elements (and there is a special element which is neutral in the sense that multiplying by it does nothing).
A ring is a collection of elements which we can add, subtract and multiply. A ring also has special elements called 0 and 1 which behave in the way you would expect with respect to addition and multiplication.
A vector space over a field[6] K is a collection of elements (called vectors), equipped with both addition and scalar multiplication, where the latter is a bunch of operations indexed by the element of the field.
If you aren’t familiar with these, it’s not important; all I want you to observe is the pattern of their descriptions: elements and operations (plus some rules about how the operations interact with one another, which I have omitted).
In any of these examples, if I take a pair of elements (possibly the same element twice), I can apply a binary operation such as multiplication or addition to them to produce a further element. I can further perform operations involving the result to produce more elements, and so on. In other words, it may be the case that I can construct all of the elements of the algebraic structure by applying the operations successively to a handful of generators. For example, the positive natural numbers form a semigroup under addition, and every number can be obtained by adding 1 to itself some number of times, so 1 generates the semigroup of positive natural numbers.[7]
If I take an unstructured collection of elements, a set, I can generate a structure of any of the above types by iteratively applying the formal operations to the members of this set. This produces the free structure generated by this set. For example, the free semigroup on the set {a,b} has elements {a,b,aa,ab,ba,bb,aaa,…}, where I have written the letters next to one another to represent multiplication.[8]
If my general presentation of complexity from earlier made any sense, I’m hoping that you can already see that the number of generators provides a notion of complexity for algebraic structures.
Beyond free structures, a presentation of an algebraic structure consists of a collection of generators combined with a collection of pairs of elements of the free structure they generate which are to be identified (typically presented as equations). For example, the following is presentation of a semigroup: ⟨a,b∣aa=a,bb=b,ab=ba⟩
I claim that the semigroup it produces has just three elements, {a,b,ab}; work out what the multiplication operation does if you like. Again, if it’s not clear to you how one produces an algebraic object from a presentation, it doesn’t matter too much: the key observation is that the generators and relations in a presentation, supplemented with the generation procedure[9] for the given class of algebraic objects, provide all of the ingredients we need to view the size of a presentation as a measure of complexity of these objects. Just as in the topological case above, we must choose a notion of equivalence of structures in order to complete the picture, but the choice is much narrower for algebraic structures: we typically consider these to be equivalent if they are isomorphic, which is to say that there exists a structure-preserving bijection between their sets of elements.
Determining the structure generated from an arbitrary presentation is famously difficult, in a formal sense: it’s uncomputable in general, as are many of the measures of complexity which I’ll construct here. On the other hand, any finite structure has a “trivial” finite presentation, where the set of generators is the set of all of its elements, and the equations express the result of applying the operations to the elements. If the class of structures being considered has k binary operations then the trivial presentation of a structure with N elements has N generators and N2k relations. As a consequence, there must exist a most efficient presentation of any finite structure, and this situation is intuitively comparable to the notions of complexity we saw in previous sections. More generally, as long as we know that there exists a finite presentation of an algebraic structure (even if that structure is infinite), then there must be a smallest one, so this concept of complexity[10] makes sense for all finitely presentable algebraic structures.[11]
Between algebra and spaces
Combining the above two sections, we can consider all sorts of constructions extending our categories of algebraic and/or topological objects. This section is deep into abstraction, so I’ll keep it to a single example, but it foreshadows an approach to dealing with infinite structures which we’ll see again later on.
A profinite group is a topological group constructed as a projective limit of finite groups. Even though a given profinite group may not admit a finite construction (and may not even have a “minimal” construction in a meaningful sense) we can still extract a nuanced measure of complexity for these structures. Indeed, having determined a notion of complexity for finite groups such as the presentation length, we may define the complexity of a profinite group to be the maximum complexity of finite groups appearing in its construction, minimized across possible constructions. A particularly simple class of groups are the cyclic groups, with presentations of the form ⟨a∣ak=1⟩, where k is a positive integer and 1 represents the identity element of the group; with the above definition of complexity, it follows that the simplest (lowest complexity) class of nontrivial profinite groups are the procyclic groups, constructed as projective limits of cyclic groups.
Complexity within a structure
Why did I bother with the abstract detour above? The answer is that the structures encountered in complexity theory can often be thought of as functions defined on spaces or as elements of algebraic structures.
Piecewise maps
What does it mean for a space to be constructed by gluing together simpler spaces? One consequence is that when we define a map on the space, we can restrict that map to the simpler pieces, and conversely that if we are given mappings on the simpler pieces which are compatible with the gluing construction in a precise sense, then these mappings can themselves be glued together to define a mapping on the full space.
As such, as soon as we define a notion of complexity for functions on the simple spaces, we have numerous ways to assign complexities to the glued space: we could take a maximum or a weighted sum, say. I’ll explain some ways of assigning complexities to functions later on.
Elements of algebras
A choice of presentation of an algebraic structure A determines a natural notion of complexity for the elements of A, namely the shortest presentation of that element in terms of the generators in the presentation. Here the generators are playing the role of the basic building blocks, and the algebraic operations are the ones we allow when constructing the elements.
Observe that the trivial presentation corresponds to the trivial complexity measure I mentioned earlier, where everything has the same complexity. On the other hand, for a small generating set, the complexity becomes a lot more interesting.
It’s also worth noting that we can vary the complexity measure significantly by varying the weights assigned to the different generators and operations in the structure. In a real vector space with a given basis, for example, we could assign the basis vectors a weight of 1, scalar multiplication by r a weight of |r|2 and addition a weight of 0, in which case the “complexity” of vectors is the sum of the squares of the coefficients determining the vector in that basis, the “squared length” of the vector when the basis vectors are considered to have length 1. Contrast this with what happens if we assign scalar multiplication a weight of 0 and addition a weight of 1, in which case the complexity just counts the number of basis vectors required to present a given vector (equivalently, the minimum dimension of a subspace generated by basis vectors containing the given vector). Both of these are clearly highly dependent on the choice of basis.
Fragility of complexity measures
In our examples, we have seen that the complexity measure we end up with depends on all of the choices we made along the way, including the context, the basic building blocks, the allowed operations for combining them and the weights assigned to each of those ingredients. Changing any of these choices not only affects the numerical value we end up with, it can completely change which elements or objects are considered more or less complex than others. This is not a profoundly surprising result, although the degree of sensitivity may be surprising; see the end of the section on computational complexity below for an example.
Until recently, the fragility of complexity measures seemed to me to be an insurmountable problem. I don’t want to be using an evaluation that depends heavily on apparently arbitrary choices; those choices shouldn’t be influencing my results!
Fortunately, we can more often than not excise some arbitrary choices while salvaging a meaningful measure of complexity, through a variety of mechanisms. This is inherently a trade-off: the more we try to make our complexity classes independent of choices, the coarser-grained the notion of complexity becomes. For my own purposes, I’m hoping that I will nonetheless reach a complexity measure which captures the intuitions I’m seeking to formalize.
Minimal presentations
First of all, we can shift the choices up a level, where they can seem less arbitrary. If our context is a (finitely presentable) algebraic structure, and we have chosen a sensible notion of complexity for presentations, we can consider the generators of a minimal presentation as a natural choice of basic building blocks (and we already know that this will at least give a more refined notion of complexity than a non-minimal presentation).
If there is a unique minimal presentation for the context, the above is an entirely natural choice. However, there may instead be several (even many) minimal presentations for the algebraic structure: think of choosing different bases for a vector space. In that case, we can construct a more robust notion of complexity as one which is invariant under change of minimal presentation. This can be achieved, for example, by taking the minimal complexity value across possible choices of minimal presentation. The vector space example turns out to be a bad one on its own, since any vector is a member of some basis (so the complexity measures we saw earlier can never be basis-invariant), but if we instead consider linear endomorphisms of a vector space, choosing a basis for the space enables us to present the endomorphism as a matrix, and there are various non-trivial invariants including the rank which are basis-invariant, and can be considered as notions of complexity.
If we take a vector space equipped with an inner product, then while there are still just as many choices of basis available, some of those choices have nicer properties: the best-behaved ones are orthonormal bases. (As ever, if those terms are unfamiliar, read around them. The key take-away is that in some contexts the minimal presentation in an algebraic sense is not automatically the best possible.) We could modify the complexity measure on presentations to take the inner product structure into account, or just impose the constraint on choices anyway. Either way, having selected some extra criteria which the minimal presentations of the context must satisfy, we reduce the number of constraints on a complexity measure when we demand it be invariant under change of presentation. The “length squared” measure that I described in the last section is invariant under change of orthonormal basis, for example.
A compelling example of constrained bases in the definition of a complexity measure is that of tensor rank. One way of constructing a vector space is as a tensor product of a pair of vector spaces, say V⊗W; an element of such a space is typically called a tensor. The natural bases for V⊗W are those constructed from bases of V and W. Unlike for vector spaces more generally, not every tensor is a member of such a basis, and we therefore get a meaningful notion of tensor rank as the minimal number of basis vectors (across all possible constrained bases) required to present a given tensor.
Minimal decompositions
Much of the discussion above can similarly be applied to spaces: while there may be many decompositions of a space into simpler spaces, minimal decompositions provide natural basic components for complexity measures. Both here and above, if the minimum can be concretely identified, it will also necessarily be the simplest presentation of the space or algebra to work with from a computational point of view!
Consider the circle, or 1-sphere. When presenting this as a manifold, the minimal decomposition involves two open line segments (open subsets of R), glued along a pair of sub-segments.
Clearly there are several choices to be made here about the four segment lengths (the segments being glued together and the lengths of the overlaps); although the end result as a space is always a circle, the choice of decomposition will affect our definition of complexity of functions on the pieces, and we can only obtain an invariant complexity measure by taking all of the minimal decompositions into account, which amounts to examining symmetries of the resulting space. The study of symmetries moves us beyond topology into geometry. Finding complexity measures for functions which are invariant under these re-parametrizing symmetries is an interesting challenge, which is beyond the scope of this post.
Invariance at the cost of individuality
In some cases, there will be no useful choice-invariant notion of complexity for individual structures. We saw earlier that Kolmogorov complexity requires a choice of universal Turing machine. However, given any string, we can design a universal Turing machine which makes that particular string have arbitrarily small or large complexity, essentially by manipulating how the programs are encoded.
On the other hand, Kolmogorov proved that his measure of complexity is choice-invariant up to a constant, as follows.[12] If I take any pair of universal Turing machines U and U′, each can be compiled into the other, in the sense that there are strings u and u′ such that for all input strings x and y (independent of u,u′) we have U(u′;x)≡U′(x) and U′(u;y)≡U(y), where the semicolon denotes concatenation and the triple-line means “produces the same behaviour (in terms of halting and outputs)”. As such, for any string s, we have KU′(s)≤KU(s)+|u| and KU(s)≤KU′(s)+|u′|. An immediate consequence of this fact is that, while we cannot produce a measure of the Kolmogorov complexity of an individual string which is independent of the choice of Turing machine, if we take a sequence of strings, we can extract coarser-grained complexity values which are independent of the choice.
For instance, the asymptotic growth rate of the Kolmogorov complexity of a sequence of strings is a choice-invariant notion. Asymptotic here roughly means “on average in the limit of large indices”, the idea being that while we have little control over the exact lengths of programs required to generate individual strings on different Turing machines, the constant bounds on the difference between Turing machines of the complexity of strings in a sequence will merely be an “error” whose contribution can be made as small as we like by averaging across a sufficiently large number of strings in the sequence. An infinite stream S of digits also has a well-defined notion of complexity, by identifying S with the sequence (sn)n∈N whose nth term sn is the string consisting of the first n digits of the stream. An essentially random stream will produce a complexity that grows linearly, KU(sn)=O(n), whereas an infinite string of 0s has complexity which grows sub-logarithmically KU(sn)=o(log(n)); the invariance means that we can drop the subscript U in these statements.[13]
One must be careful with the kind of shift of context we performed above. We haven’t actually resolved the problem of having a well-defined complexity for individual strings: their natural substitutes as sequences, the constant sequences, all have the same complexity of “no growth”, KU(s)=O(1). Rather, we have merely observed that sequences of structures (such as strings) form a context which is rich enough to leave interesting complexity measures after imposing invariance with respect to symmetries which made complexity measures on the individual structures trivial.
Note that the asymptotic growth rate would not work as a measure of complexity of the profinite groups we saw earlier, even if we restrict to those which are expressible as the (projective) limit of a sequence of finite groups. This is because any subsequence or re-indexed sequence will produce the same group in the limit, and so we can make the complexity of the sequence grow as fast or as slow as we like. On the other hand, if we put a constraint on the form of the sequence, like imposing a relation between the index n and a bound on the size of the nth group in the sequence, we might be able to salvage an alternative to the “maximum complexity” measure we proposed earlier.
I should also stress that the asymptotic growth rate classes I have mentioned above are characterized by upper bounds, so that the class of O(f(n)) sequences includes also all of the classes of O(g(n)) sequences where g(n) grows more slowly than f(n). A more precise statement would be one of the form KU(sn)=Θ(f(n)), meaning that the growth rate of the Kolmogorov complexity of (sn)n∈N is bounded above and below by constant multiples of f(n), which excludes slower growth rates.
Probabilistic contexts
Rather than sequences, we might consider subsets of the family of finite strings. For sufficiently uniform subsets, one can make statements about the Kolmogorov complexity which are independent of the choice of Turing machine. For example, an argument based on the pigeon-hole principle shows that if we take the collection of strings of length n, even if we construct a universal Turing machine especially to compute these strings (so that all of the shortest possible progams produce strings of length up to n), since there are at most 2n−1 programmes of length less than n, at least one string in this set must have Kolmogorov complexity n (or higher) with respect to every universal Turing machine. By putting a uniform distribution on the strings of a certain length, we can express this as a probabilistic statement: for any universal Turing machine U, the probability that a randomly chosen string of length n has KU-complexity at least c is 1+2−n−2c−n+1.
Relatedly, in place of individual (deterministic) streams of digits, I could consider streams of digits which are generated according to a random process, and then extract statistical statements about the asymptotic behaviour of such streams, such as the expected asymptotic growth rate. More generally, probabilistic variations on the context of structures under consideration can represent an alternative way to eliminate the dependence on choices, while retaining the same fundamental objects. The cost is that the resulting evaluations of complexity will be probabilistic or statistical in nature.
Values for complexity measures
Up until the last subsection, the only types of complexity value which I had presented to you were numerical, either whole numbers (counting a number of operations or pieces) or real numbers (measuring lengths or weighted sums). However, we just saw that we can attach values to complexity which are more subtle, such as growth rates attached to sequences of structures. What other kinds of values can complexity measures take?
Number vs Order
In other areas of maths, we see number systems extended in various ways. The name “complex numbers” might seem suggestive at this point as a potential target for valuing complexity. Unfortunately, complex numbers lack a key requirement for complexity measures, namely order. When defining the complexity of a structure, we want it to identify the minimum amongst possible values associated to that structure, and with complex numbers there is no natural ordering to minimize along. There are artificial ways to attach an ordering, like looking for a complex value with minimal modulus or considering only complex values with positive real and imaginary part and associating some ordering on these, but both strategies amount to lifting the ordering on the (positive) real numbers (resp. possible orderings on R+×R+), at which point we might as well use those, since the extra structure on the complex numbers is essentially being discarded.[14]
On the other hand, the ordinals, a generalization of numbers indexing (possibly infinite) discrete lists, have sufficient structure to act as complexity values, since by definition any set of ordinals has a least element. While this may seem rather abstract, ordinals are used to construct heirarchies of functions such as the ones we encountered for classifying growth rates in the above. I suspect that more intricate number systems, like a countable fragment of the surreal numbers, could be equally be used to construct hierarchies of this kind. For example, we can produce functions which tend to infinity arbitrarily slowly as well as arbitrarily fast, and these can be included as “infinitesimal factors” to growth rate functions defining complexity classes.
Total orders are convenient for defining complexity classes, but they are not necessary. As long as our collection of values has arbitrary meets (greatest lower bounds), that’s sufficient to define a minimum value, “the” complexity of the structure[15]. Incidentally, we have already seen some situations where our collection of complexity values is insufficient to express the complexity of some structures: the presentation complexity of an algebraic structure with no finite presentation is not defined. An alternative to delineating different sizes or qualities of infinity is to just give up and say that any structure which doesn’t admit a value in our chosen collection has infinite complexity (the greatest lower bound of the empty collection, or top element of the order). If our complexity measure is supposed to express the difficulty of a problem or construction, this kind of collapse makes a lot of sense: there’s little immediate benefit to delineating between different degrees of “impossible”!
For the remainder of this post, I’ll talk about a valuation structure for a choice of values for complexity measures. It can be useful for a valuation structure to have further algebraic structure, as the positive reals and the natural numbers do. A formal reason for this is the subject of the next subsection.
Symbolic complexity values
Warning: this is the most abstract subsection of the post, proceed with caution.
Once we have determined the generating operations and the building blocks for our measure of complexity, there remain choices to make about what values to assign to each of them, especially which features should have non-zero cost, and how these costs are combined in computing a complexity bound from a construction. Putting aside the last subsection for the time being and considering just the simplest case where the complexity values are taken to be positive real numbers, the precise values needed in practice may depend on implementation details. In the most concrete examples, these values might be determined by the relative speeds of the operations for a given (fixed) implementation on my hard drive.
Rather than making the judgement about what these values should be straight away, we could interpret each basic building block as a (formal) variable and interpret each construction operation as a (formal) function, so as to view the construction itself as an algebraic expression for the complexity bound it determines for the structure it defines. A complexity measure then amounts to an assignment of elements of our valuation structure to each variable and an interpretation of the formal functions as operations on the elements of the valuation structure. The complexity of a structure is the minimum (in the ordering on the valuation structure) of the resulting value over its possible constructions.
A reader with experience in logic might prefer to think of it like this: a construction of a structure, such a presentation, is a purely syntactic expression in a particular language which comes equipped with predetermined semantics in the domain of interest, such as the class of algebraic structures. A complexity measure is a second choice of semantics for the same language, this time in the valuation structure; I might as well call the second semantics an evaluation. The complexity of a structure is the minimum evaluation of constructions of equivalent structures. (Beware that this syntax-semantics presentation might cause some confusion when we get onto implicit complexity theory later, since the constructions there are proofs but the structure generated by a proof is just its conclusion, which is also a syntactic object!)
This abstraction reveals the higher-order choices involved in defining complexity measures, especially the ones determining how we combine the costs associated to the building blocks when they are put together with an operation.[16]
It is not clear to me whether this formulation of complexity is exhaustive. A priori there may be other contributing factors to complexity that a given evaluation fails to capture, but any examples in this vein that I can come up with seem to be accounted for by including extra operations in the constructions. In my original example of multiplication of large numbers in terms of multiplications and additions of individual digits by composition, for example, we could also include the cost of shifting digits around in these algorithms to account for how the time taken to multiply larger numbers in practice will grow faster than the number of single digit operations involved (for practical implementations).
Combinatorial complexity measures and projective parameter spaces
I’ll call a complexity measure combinatorial if the basic building blocks are assigned constant (positive real) values and all of the combinations operations are additive on the complexity (or possibly add constants; I’ll leave that possibility implicit here). Our measures which counted the number of basic digit addition and multiplication operations, the string length and the presentation complexity are all combinatorial. This is arguably the simplest kind of complexity measure, if such a notion makes sense,[16] and is amenable to some specialised analysis techniques, which I’ll sketch here.
Observe that if we scale all of the values assigned to the basic building blocks in a combinatorial complexity measure by some positive real number, the complexity value for all constructions are scaled by the same number. Since complexity theory is mostly concerned with comparative evaluations of which presentations are more complex, it may therefore be useful to consider the bounds on complexity determined by given constructions in terms of projective geometry.
Consider the basic operations of addition and multiplication of digits we considered earlier. We can define a projective space[17] of parameter values for these operations, whose elements are the possible ratios of costs of performing them; the dimension of this space will be 1 (one less than the number of basic building blocks for the combinatorial complexity measure). Given a list of algorithms for multiplication of numbers of a given size in terms of these basic operations, we get a mapping from this projective space to another projective space, whose dimension is one less than the number of algorithms. The mapping sends each ratio of parameter values to the corresponding ratios of complexity bounds determined by the algorithms. This mapping characterizes, amongst other things, the break-even points for which of the different algorithms under consideration give the best upper bound on the complexity, as a function of the cost ratio of the basic operations.
The same basic argument can be extended to any combinatorial complexity measure. I don’t have the experience with projective geometry to take this analysis further than the above curious observation at the moment (I have not come across any literature on this direction so far). I might as well mention that I expect combinatorial complexity measures to be entirely adequate for my purposes in goal-directedness, so this section could get developed further by me in future.
The broad takeaway of this section is that one more way of avoiding arbitrary choices, this time in the defining values of a complexity measure, is to consider all possible choices together at once, or at least to consider classes of choices. A specific choice can either be made later (if a specific implementation will determine the values assigned) or coarser-grained conclusions can be drawn which are a function of the possible choices. In the latter case, the conclusions might be judgements about upper bounds on the complexity, as a function of the complexity parameters.
Computational Complexity
I have discussed the example of Kolmogorov complexity at some length. Let’s see some domains of computational complexity theory, for balance (although there are surely plenty I haven’t read about).
Classical computational complexity
Computational complexity theory asks, “how much of a given resource (time, space/memory, etc) is required to compute a value/solution?” Just like Kolmogorov complexity, the answer to such questions depends on the details of the model of computation being considered. Even though “computation” is in the name, the mathematically interesting details of computational complexity are considered to be those which are independent of specific implementation. For this reason, it is typical to consider an implementation-independent version, which just as above is most easily achieved by instead considering sequences of problems with a sensible indexing parameter. A large number of different models of computation can simulate one another with an encoding of inputs of at worst polynomial (even quadratic!) size. The complexity classes we end up with are measures of asymptotic growth rates, in the sense I described earlier, but these classes are coarser than those above, being defined up to a polynomial change of indexing of the sequences.
This coarse-graining also resolves another issue, related to the implementation of structures as inputs to models of computations. For example, if the input is a simple, undirected graph, should the input size measure the number of vertices or the number of edges? The latter grows approximately quadratically in the latter (a simple graph has at most (n2−n)/2 edges). There are other possible choices too, such as the sum of the number of vertices and edges; all in all, selecting the indexing amounts to choosing a complexity measure for graphs. Since the complexity classes are indifferent to the exact complexity measure employed for the structures, on the condition that these are related to one another by polynomial bounds, these choices can be safely ignored. as is the case for my graph example.
The question then becomes, “how quickly does the quantity of a given resource required to compute a value/solution grow with the size of the input?”
Typical examples of complexity classes are:
Deterministic time complexity classes, such as P (polynomial time), DLOGTIME (deterministic logarithmic time with random access[18]) and EXPTIME (exponential time). These measure how quickly decision problems in a sequence increase in running time, or number of operations performed, with the size of their inputs.
Non-deterministic time complexity classes, such as NP (non-deterministic polynomial time) and coNP, which have several equivalent definitions. One of these is that they measure the growth rate of time taken to check a proposed solution to a problem is valid (resp. invalid).
Deterministic space complexity classes, such as L (logarithmic space), PSPACE and EXPSPACE. These measure the growth rate of the amount of memory/working space required for a decision problem in terms of the size of the inputs.
Nondeterministic space complexity classes like NL, NPSPACE and NEXPSPACE. Savitch’s theorem shows that the latter two of these are equal to their deterministic counterparts.
Interpolating classes such as SC (Steve’s Class) which place simultaneous demands on time and space complexity of an algorithm.
All of these are defined for (variations on) Turing machines, but many of them have closure properties which allow them to be characterized in other ways.
Instead of decision problems, we could consider computable functions, and determine the complexity classes for such functions by the same criteria as above. The analogue of P for functions is sometimes called FP. The class FP is closed under composition, and was characterized by Cobham as the smallest class of functions with certain closure properties containing the functions “append 0”, s0(w)=w0; “append 1″, s1(w)=w1, and “log-exponential”, f(x,y):=xl(y) (where l(y) is the length of y in bits). In other words, the resulting class of functions has an implementation-independent characterization! Obtaining and manipulating intrinsic characterizations such as this is the subject of implicit complexity theory.
Many containments are known between the various classes. A great number of the open problems in complexity theory ask whether such containments are strict. Most famously, it’s easy to show that any problem which can be decided in polynomial time can be checked in polynomial time, so P⊆NP, but it’s not known whether some problems checkable in polynomial time are not decidable in polynomial time; this is the famous P vs NP problem.
Why should the answer be difficult to determine? An oracle is a hypothetical component which can be added to a Turing machine which responds to certain queries (even uncomputable ones) efficiently; in the language of the present post, this amounts to including those queries or functions as basic operations. Baker, Gill and Solovay proved in the 1970s that there exist oracles which make the relativized P vs NP problem true or false. Computational complexity is very sensitive indeed to which operations are allowed and their respective costs. Several proof methods in computational complexity theory up to this point could be relativised, meaning that adding an oracle made no difference to their arguments. As such, the Baker-Gill-Solovay result demonstrates that such methods cannot distinguish P from NP; this is the relativization barrier.
Circuit complexity
A more specialized branch of computational complexity deal with circuits, which are a more rigid notion of algorithm closer to the algebraic contexts I presented above, where the operations are logic gates. The complexity is measured in terms of various measures of the size of the circuits, including width and depth, and the classes are defined in terms of these and a number of other factors, such as the number of inputs that each gate is allowed to accept and the order in which logical operations are allowed to appear. I don’t know much about these yet, so I’ll just mention the AC (alternating class) hierarchy and NC (Nick’s class) as interesting examples.
Descriptive complexity
By definition, a decision problem is the problem of determining whether a certain proposition (logical formula) is true for a structure. For a given class of finite structures, encoded in such a way as to allow their being input into a Turing machine, say, we can identify a proposition interpretable on these structures with its extension, which is to say the collection of structures which validate the proposition.
Since propositions are formal statements with lots of structure, we have various notions of complexity for them, including length. The type of complexity that is considered in descriptive complexity theory is that of the smallest fragment of logic (the smallest family of logical operators or constructors) within which the proposition can be expressed. This is interesting in its own right in the context of this post, since it’s a collection of complexity classes which is not a coarse-graining of a numerical complexity measure in any obvious sense. Some examples of fragments of logic include:
First order logic (FOL) includes the usual negation (NOT), (finitary) conjunction (AND) and disjunction (OR) operations applied to the basic relations on the structures, as well as existential (∃) and universal (∀) quantification applied to variables ranging over the elements of the structures.
FOL with a transitive closure augments the above with the ability to take the transitive closure of a relation.
Second-order logic (SOL) allows for quantification over the family of relations defined on the structures.
Conversely, each fragment of logic defines a class of decision problems. Surprisingly, many of these classes coincide with complexity classes listed above, which is another example of classes having characterizations independent of their relation to Turing machines or circuits.
I could observe that the complexity values in descriptive complexity theory aren’t closed under intersections in any obvious way. We have equivalence between propositions in different fragments of logic if they have the same extension which means that a language can be in multiple classes, but a priori the existence of equivalent propositions in two incomparable fragments doesn’t imply the existence of an equivalent proposition in a smaller mutual fragment. The complexity of a family of structures is thus only formally well-defined if we can prove that this is the case, or if we allow “intersection classes” expressing the fact that a formula lies in multiple incomparable fragments of logic.
Descriptive complexity theorists are not concerned with this problem, to my knowledge, because the class of fragments of logic is entirely open: we can always extend our logics with new connectives and constructions, so there is little hope of being exhaustive about which precise complexity class a class of structures lies in unless we impose some (potentially arbitrary) extra constraints.
Complexity measures for functions
I can now go about constructing some of the complexity measures I’ve been promising. The starting point is analysis, yet another branch of maths..!
Approximation complexity
Suppose I want to assign a measure of complexity to a function. If the function is computable, we fall into the realm of computational complexity. But a function may be formally uncomputable in several different ways.
One obstacle is that the function may take values which are not accessible in our chosen model of computation. A machine which can only output natural numbers will struggle to express negative numbers; a machine which can express integers may not explicitly be able to output rational numbers, and a machine which can represent rational numbers may be unable to represent irrational algebraic numbers or arbitrary real numbers. Another obstruction that could arise is that the calculation must check an infinite number of cases to arrive at a conclusion, and so on some inputs never produces an answer.
Although calculating an exact value is hopeless in such situations, in the first case we can at least gauge the complexity of approximations to these functions.
For the cases where the machine outputs integers, we can compute the complexity of the floor or ceiling of a given function, as a function of the input. If the inputs of the functions are also natural numbers, this reduces to ordinary computational complexity, but more generally we may have to choose a sequence of inputs from which to obtain a complexity class as the asymptotic worst-case upper bound on the complexity of the approximation.
The rational-to-real case is more interesting. Here, we can examine the resources required to achieve an approximation of a certain quality, which is to say produce an output within ϵ of the true value. In this situation we can fix the input value and consider the resources required to achieve ever better approximations of the function value; a natural choice of sequence here to measure the complexity along is to take ϵ=1/n with n=1,2,3,… The asymptotic time complexity in this situation is sometimes called convergence rate or iteration complexity. As we saw earlier with Kolmogorov complexity, we must fix a machine against which to measure the asymptotic complexity, since otherwise we could choose a machine at each stage which just outputs a pre-computed sufficiently good approximation; for reasonable complexity measures, the result should depend only on the function, rather than the choice of machine.
There is another implicit parameter choice being made here. I am using the absolute difference to judge the quality of the approximations, but other metrics are possible: for functions to the rationals, we could measure the p-adic distance to the true value for some p. This choice of metric will be problem-dependent.
Simple functions
Rather than approximating individual values of the function (and possibly have to make a choice about which machine to use in the calculation and which input values to assess the complexity at), we can ask how difficult a given function is to approximate by a class of simple functions. This has a specific meaning in the theory of Lebesgue integration, namely a function obtained as a finite sum of (real multiples of) indicator functions for measurable subsets. That coincidence is entirely intentional, since the first example I want you to picture is that of approximating a function from (a bounded interval in) the reals to the reals by piecewise constant functions. In this situation, I can ask how many constant pieces I need to use to obtain an approximation to the function within ϵ=1/n of its true value, and hence measure the complexity of the function of the growth rate of that number as n increases. You could call this integration complexity, since the complexity determines how quickly the function’s Lebesgue integral converges.
Even for this notion of simple function, there are choices to make. Since measurable subsets are generated from open (or closed) intervals, we could insist that the simple functions be constructed from indicator functions of intervals, or we could allow arbitrary measurable subsets. Another consideration is whether we require that the function is approximated at all input values or merely “almost everywhere”, the latter being a notion more compatible with the tools of measure theory; for continuous functions there is no difference.
As another consideration, we might wonder if the integration complexity of a function is invariant under affine transformations (that is, under translations and rescaling), following the geometric intuition that congruent shapes in the plane should behave in similar ways. The complexity is already invariant at the level of individual approximations under horizontal translation and scaling, since we can translate and rescale an optimal sum of simple functions to obtain an optimal sum with the same number of terms for a function so transformed. For vertical translation, we at worst need to add a single simple function (a multiple of the indicator function for the domain of definition, assuming it is sufficiently nice), and since growth rates are insensitive to constant changes, this makes no difference. Vertical scaling is more subtle: a simple function s with |s(x)−f(x)|<1/n rescales to a worse approximation when the scale factor is larger than 1: |ks(x)−kf(x)|=|k||s(x)−f(x)|<|k|/n. However, growth rates are coarse enough to absorb this kind of discrepancy as a contribution to the constant bounding factor, with the result that integration complexity is invariant under affine transformations! The cost of this invariance is that all simple functions have a complexity of O(1), although this too justifies the choice of name, since simple functions are those whose Lebesgue integral can be computed exactly with a finite sum.
Power series, Fourier series
Depending on the nature of the problem, it may be more interesting to use other classes of functions to approximate the function and thereby measure its complexity. Power series, including Taylor series, are a very well-studied example of this. Given a function expressed as a power series, I can measure its complexity in terms of the growth rate of the coefficients, where now the types of functions which I encounter as growth rates will typically be sub-constant. Alternatively, I could transform these coefficients into a measure of the rate of convergence, defining the complexity of a series ∑∞j=0ajxj defined over some domain D around 0, to be the asymptotic growth rate of the function n↦min{N∣∑∞j=Najxj<1/n in D}.
As ever, there are alternative choices available here. I could measure the growth rate of the absolute convergence (replacing the summands with their absolute values in the last definition), for example. A natural feature of these complexity measures is that they are invariant under changing a finite number of coefficients, or equivalently under adding a polynomial to the power series. We also have invariance under affine transformations, comparably to the last subsection, as long as we also translate and scale the domain appropriately.
Of course, power series are not the only way to decompose a function. We can also look at Fourier series, decomposing a function into sine and/or cosine functions, and construct similar definitions of complexity from the coefficients of such an expansion. In this case, the complexity being measured will be geometric in flavour: a function which is low in “Fourier complexity” is well-described by its low-frequency components, whereas higher complexity means that the high-frequency terms play an increasingly important role in the function’s behaviour. Any finite sum of (discrete frequency) sine waves will have constant complexity. There is also a relationship between the convergence rate of the Fourier decomposition of a function over an interval and the differentiability of the function.
It is interesting to contrast the notions of complexity arising from the two decompositions above. They are not directly comparable: a straight line has minimal power-series complexity but non-trivial Fourier complexity, and dually a simple sine wave has interesting power-series complexity but trivial Fourier complexity. This illustrates once again how complexity measures depend on our choices of basic building blocks or operations. There are, of course, yet other decompositions of functions that one might consider!
Functions on spaces
I have talked extensively about functions on the reals. Much of the above can be extended directly to functions on the complex numbers. Only a little extra work is needed to define complexity measures for (real- or complex-valued) functions on higher dimensional spaces, more specifically on domains in Rn, since we can define higher-dimensional simple functions, polynomials and harmonic waves easily, typically as products of the lower-dimensional ones. Some decisions need to be made, of course: for simple functions, are only rectangles allowed, or are indicator functions for more general measurable subsets acceptable? I won’t examine the consequences of such choices here.
With these choices established, we can now construct complexity measures for functions defined on more exotic spaces. For example, if we are given a manifold M which has a finite decomposition as a gluing of subspaces of Rn, a function M→R decomposes into functions defined on those subspaces, whose complexity we know how to measure! In keeping with the methods discussed earlier in the post, we therefore define the complexity of such a function as the minimal sum across minimal decompositions of M of the complexity of the restricted functions. We can do something similar for CW-complexes, where the rigid simplicity of the components involved (disks) may afford a finer-grained notion of complexity.
It’s surely possible to go further and define the complexity of functions between manifolds in terms of decompositions, but the higher you go in the definitions, the more work you have to do to ensure that your lower-level measures of complexity are invariant enough to combine together nicely into a well-defined (or interesting) higher-level measure of complexity.
Conclusions
My hope is that this post opens the door to complexity for others. Here on LessWrong especially, people often make reference to some established notions of complexity including Kolmogorov complexity, but it sometimes seems to me that a more tailored measure of complexity would better reflect the intuition they are trying to formalize.
A more ambitious hope of mine is that people will take some of these ideas and run with them. The most exciting possibility as I see it is a formal exploration of the relationships between complexity measures in different settings and their relationships with other invariants. Such work might form the foundation for new approaches to attack the hard open problems in computational complexity theory.
Coming soon...
For the purposes of my goal-directedness project, the next step is to use the ideas I’ve explored here in order to build a measure of complexity for goal-directedness. This will require me finally making some choices which I’ve been putting off for the whole project, but it will also get me pretty close to some concrete answers.
I would also like to examine relative complexity, a topic which will be directly relevant to measuring the complexity of explanations, and which I think will be clarified by the view I’ve taken on complexity in this post.
Further References
The primary inspiration for this post was J. M. Landsberg’s book Geometry and Complexity Theory, which was gifted to me by my postdoc advisor Thomas Seiller (thanks Thomas!). The book is about algebraic complexity theory, a domain which applies algebraic geometry to computing (or at least bounding) the complexity of algebraic problems. While it doesn’t discuss complexity theory at the level of generality that I have attempted to here, it was enough of a departure from the computational complexity notions that I had encountered previously to give me a new perspective on what complexity can (or should) mean.
For Kolmogorov complexity, I found Marie Ferbus-Zanda and Serge Grigorief’s paper Kolmogorov complexity in perspective instructive.
For computational complexity, I used some fragments of Sanjeev Arora and Boaz Barak’s book Computational Complexity: A Modern Approach, although I could benefit from a deeper reading of that.
Chatain’s LW post on Occam’s Razor and the Universal Prior was useful in understanding how K-complexity arises in Bayesian formalisms.
The word “structure” here is intentionally quite generic. The structure can be a mathematical object (a space, an algebra, a function...), or a computational object (an algorithm, a solution to a problem...).
See the right-most column in the Multiplication entry of the table; long multiplication takes O(n2) operations, meaning that the number of operations is approximately proportional to the square of the number of digits in the numbers being multiplied (assuming here that the numbers are of the same order of magnitude), but we can do much better: the first improvement was discovered by Karatsuba and reduces the number of basic multiplications needed to multiply 1000-digit numbers by a factor of 17.
A similar story is the surprising existence of more efficient algorithms for matrix multiplication, which is one focus of the domain of algebraic complexity theory.
For consistency with the Turing machine example that follows, this should probably be “write 0 and shift right”.
A Turing machine is a pretty famous abstract model of computation consisting of some tapes, which are unbounded collections of discrete cells which are either empty or contain a symbols (usually 0 or 1), a device called the head that can read from and write to the tapes, and a program instructing the head how to interpret the symbols it encounters. If this isn’t familiar to you, almost any programming language will do for the purposes of understanding Kolmogorov complexity.[19] I can be informal about this, and about the obviously underspecified parameters in the description of the Turing machine, such as the number of tapes and their shape (does it go off to infinity in both directions or just one?), the language/symbols which the machine can read and write, and so on, thanks to the result I quote later on.
Amusingly, finding the most efficient algorithm to produce a given output in various programming languages is a pastime with a significant community around it.
A field is a ring where we can also divide by elements other than 0. I didn’t include it in the list of algebraic structures, because the “other than zero” part turns out to prevent the existence of free fields!
The more famous version of this example would be the inductive definition of the natural numbers in Peano arithmetic: any natural number can be expressed by applying the successor (“add 1”) function to 0 the corresponding number of times. However, since the variety of algebras with just a single unary operation isn’t as widely studied to my knowledge.
Formally speaking, (aa)a and a(aa) could be considered distinct elements were it not for one of the axioms of semigroups which I didn’t mention, namely associativity. I mention this here to highlight how many of the formal details are missing from this account. I’m trying to paint you an intuitive picture, but if you’re curious about what’s under the hood, I would be more than happy to discuss it in the comments!
Note that the generation procedure here is very sensitive to the type of structure we are considering: a ring generated from the presentation ⟨a,b∣aa=a,bb=b,ab=ba⟩ would be infinite, since we have imposed no restrictions on sums of elements.
I don’t know whether this notion is typically studied under the banner of complexity theory. As I mentioned in the introduction, I’m deliberately seeking an inclusive definition of complexity here.
Of course, it can also be extended to algebras which only admit infinite presentations, but infinite cardinals are a lot coarser than finite ones, and one has to start worrying about set-theoretic foundations, whereas the finite notions are a lot easier to treat constructively.
I implicitly assume in this argument that I’m working with Turing machines taking inputs in binary, which is what’s needed to make the “up-to-constant’ claim hold; if the Turing machines accept different alphabets, then I need to encode each into the other for the input to make sense, and this can modify the length of programs by a scale factor. For the conclusions regarding asymptotic growth rates that I mention, this subtlety is unimportant.
I’ve used here, Big Oh and Little Oh notation here. This is a compact way to express asymptotic bounds on functions; I also use Big Theta notation later on.
The point of this paragraph is to disillusion the reader about the existence of any strong existing connection between “complexity” and “complex numbers”. Just as elsewhere in maths and science, it’s entirely possible that a clever application of complex numbers[17] could solve some problems in complexity theory, but I maintain that complex numbers do not constitute good candidates for complexity values.
Up until this point, it has been the case that the complexity value directly measures a most efficient construction, presentation or decomposition of an object. In allowing more general structures, there arises the possibility of situations in which there is no minimizer: we could have a sequence of increasingly efficient constructions with no minimum, or a collection of minimal constructions which are incomparable in the ordering. This is not intrinsically problematic, but may require some care in general arguments about complexity.
Amusingly, the syntax-semantics presentation of complexity measures is sufficiently precise for us to begin asking questions about the complexity of complexity measures… I’ll leave to some other brave soul the pleasure of untangling that ouroboros.
Although I have dismissed complex-valued complexity measures, when studying maps between projective spaces, complex values can help achieve a more complete global picture. This train of thought leads down a rabbit hole which I shall not explore here, but there is precedent for taking this idea seriously: recall that complex values were useful to electrical engineers in uniting resistance and capacitance, for example.
For the logarithmic time class to contain any interesting problems, we need to impose the assumption that moving along the input (such as moving any distance along the input tape of a Turing machine before reading) takes constant time, since otherwise the time taken to move is already linear in the input length. For example, assuming the input n>0 is provided in binary with a leading 1, the function n↦⌊log2(log2(n)+1)⌋ is a member of DLOGTIME since we can determine its value by initialising the output at 0, checking for the presence of a digit at positions 2k of the input for increasing k≥1, adding one if a digit is found and halting otherwise.
Arora and Barak list Turing machines, lambda calculus, cellular automata, pointer machines, bouncing billiards balls and Conway’s Game of life as abstract models of complexity which can simulate one another and hence are Turing complete.