My biggest objection to this definition is that it inherently requires time
Fascinating—but why is this an objection? Is it just the inelegance of not being able to look at a single time slice and answer the question of whether optimization is happening?
One class of cases which definitely seem like optimization but do not satisfy this property at all: one-shot non-iterative optimization.
Yes this is a fascinating case! I’d like to write a whole post about it. Here are my thoughts:
First, just as a fun fact, not that it’s actually extremely rare to see any non-iterative optimization in practical usage. When we solve linear equations, we could use gaussian elimination but it’s so unstable that in practice we use, most likely, the SVD, which is iterative. When we solve a system of polynomial equation we could use something like a Grobner basis or the resultant, but it’s so unstable that in practice we something like a companion matrix method, which comes down to an eigenvalue decomposition, which is again iterative.
Consider finding the roots of a simple quadratic equation (ie solving a cubic optimization problem). We can use the quadratic equation to do this. But ultimately this comes down to computing a square root, which is typically (though not necessarily) solved with an iterative method.
That these methods (for solving linear systems, polynomial systems, and quadratic equations) have at their heart an iterative optimization algorithm is not accidental. The iterative methods involved are not some small or sideline part of what’s going on. In fact when you solve a system of polynomial equations using a companion matrix, you spend a lot of energy rearranging the system into a form where it can be solved via an eigenvalue decomposition, and then the eigenvalue decomposition itself is very much operating on the full problem. It’s not some unimportant side operation. I find this fascinating.
Nevertheless it is possible to solve linear systems, polynomial systems etc with non-iterative methods.
These methods are definitely considered “optimization” by any normal use of that term. So in this way my definition doesn’t quite line up with the common language use of the word “optimization”.
But these non-iterative methods actually do not have the core property that I described in the square-root-of-two example. If I reach in and flip a bit while a Guassian elimination is running, the algorithm does not in any sense recover. Since the algorithm is just performing a linear sequence of steps, the error just grows and grows as the computation unfolds. This is the opposite of what happens if I reach in and flip a bit while an SVD is being computed: in this case the error will be driven back to zero by the iterative optimization algorithm.
You might say that my focus on error-correction simply doesn’t capture the common language use of the term optimization, as demonstrated by the fact that non-iterative optimization algorithms do not have this error-correcting property. You would be correct!
But perhaps my real response is that fundamentally I’m interested in these processes that somewhat mysteriously drive the state of the world towards a target configuration, and keep doing so despite perturbations. I think these are central to what AI and agency are. The term “optimizing system” might not be quite right, but it seems close enough to be compelling.
Thanks for the question—I clarified my own thinking while writing up this response.
Another big thing to note in examples like e.g. iteratively computing a square root for the quadratic formula or iteratively computing eigenvalues to solve a matrix: the optimization problems we’re solving are subproblems, not the original full problem. These crucially differ from most of the examples in the OP in that the system’s objective function (in your sense) does not match the objective function (in the usual intuitive sense). They’re iteratively optimizing a subproblem’s objective, not the “full” problem’s objective.
That’s potentially an issue for thinking about e.g. AI as an optimizer: if it’s using iterative optimization on subproblems, but using those results to perform some higher-level optimization in a non-iterative manner, then aligning the sobproblem-optimizers may not be synonymous with aligning the full AI. Indeed, I think a lot of reasoning works very much like this: we decompose a high-dimensional problem into coupled low-dimensional subproblems (i.e. “gears”), then apply iterative optimizers to the subproblems. That’s exactly how eigenvalue algorithms work, for instance: we decompose the full problem into a series of optimization subproblems in narrower and narrower subspaces, while the “high-level” part of the algorithm (i.e. outside the subproblems) doesn’t look like iterative optimization.
Fascinating—but why is this an objection? Is it just the inelegance of not being able to look at a single time slice and answer the question of whether optimization is happening?
No, the issue is that the usual definition of an optimization problem (e.g. maxxf(x)) has no built-in notion of time, and the intuitive notion of optimization (e.g. “the system makes Y big”) has no built-in notion of time (or at least linear time). It’s this really fundamental thing that isn’t present in the “original problem”, so to speak; it would be very surprising and interesting if time had to be involved when it’s not present from the start.
If I specifically try to brainstorm things-which-look-like-optimization-but-don’t-involve-objective-improvement-over-time, then it’s not hard to come up with examples:
Rather than a function-value “improving” along linear time, I could think about a function value improving along some tree or DAG—e.g. in a heap data structure, we have a tree where the “function value” always “improves” as we move from any leaf toward the root. There, any path from a leaf to the root could be considered “time” (but the whole set of nodes at the “same level” can’t be considered a time-slice, because we don’t have a meaningful way to compare whole sets of values; we could invent one, but it wouldn’t actually reflect the tree structure).
The example from the earlier comment: a one-shot non-iterative optimizer
A distributed optimizer: the system fans out, tests a whole bunch of possible choices in parallel, then selects the best of those.
Various flavors of constraint propagation, e.g. the simplex algorithm (and markets more generally)
I think this is a very important distinction. I prefer to use “maximizer” for “timelessly” finding the highest value of an objective function, and reserve “optimizer” for the kind of stepwise improvement discussed in this post. As I use the terms, to maximize something is to find the state with the highest value, but to optimize it is to take an initial state and find a new state with a higher value. I recognize that “optimize” and “optimizer” are sometimes used the way you’re saying, as basically synonymous with “maximize” / “maximizer”, and I could retreat to calling the inherently temporal thing I’m talking about an “improver” (or an “improvement process” if I don’t want to reify it), but this actually seems less likely to be quickly understood, and I don’t think it’s all that useful for “optimize” and “maximize” to mean exactly the same thing.
(There is a subset of optimizers as I (and this post, although I think the value should be graded rather than binary) use the term that in the limit reach the maximum, and a subset of those that even reach the maximum in a finite number of steps, but optimizers that e.g. get stuck in local maxima aren’t IMO thereby not actually optimizers, even though they aren’t maximizers in any useful sense.)
Fascinating—but why is this an objection? Is it just the inelegance of not being able to look at a single time slice and answer the question of whether optimization is happening?
Yes this is a fascinating case! I’d like to write a whole post about it. Here are my thoughts:
First, just as a fun fact, not that it’s actually extremely rare to see any non-iterative optimization in practical usage. When we solve linear equations, we could use gaussian elimination but it’s so unstable that in practice we use, most likely, the SVD, which is iterative. When we solve a system of polynomial equation we could use something like a Grobner basis or the resultant, but it’s so unstable that in practice we something like a companion matrix method, which comes down to an eigenvalue decomposition, which is again iterative.
Consider finding the roots of a simple quadratic equation (ie solving a cubic optimization problem). We can use the quadratic equation to do this. But ultimately this comes down to computing a square root, which is typically (though not necessarily) solved with an iterative method.
That these methods (for solving linear systems, polynomial systems, and quadratic equations) have at their heart an iterative optimization algorithm is not accidental. The iterative methods involved are not some small or sideline part of what’s going on. In fact when you solve a system of polynomial equations using a companion matrix, you spend a lot of energy rearranging the system into a form where it can be solved via an eigenvalue decomposition, and then the eigenvalue decomposition itself is very much operating on the full problem. It’s not some unimportant side operation. I find this fascinating.
Nevertheless it is possible to solve linear systems, polynomial systems etc with non-iterative methods.
These methods are definitely considered “optimization” by any normal use of that term. So in this way my definition doesn’t quite line up with the common language use of the word “optimization”.
But these non-iterative methods actually do not have the core property that I described in the square-root-of-two example. If I reach in and flip a bit while a Guassian elimination is running, the algorithm does not in any sense recover. Since the algorithm is just performing a linear sequence of steps, the error just grows and grows as the computation unfolds. This is the opposite of what happens if I reach in and flip a bit while an SVD is being computed: in this case the error will be driven back to zero by the iterative optimization algorithm.
You might say that my focus on error-correction simply doesn’t capture the common language use of the term optimization, as demonstrated by the fact that non-iterative optimization algorithms do not have this error-correcting property. You would be correct!
But perhaps my real response is that fundamentally I’m interested in these processes that somewhat mysteriously drive the state of the world towards a target configuration, and keep doing so despite perturbations. I think these are central to what AI and agency are. The term “optimizing system” might not be quite right, but it seems close enough to be compelling.
Thanks for the question—I clarified my own thinking while writing up this response.
Another big thing to note in examples like e.g. iteratively computing a square root for the quadratic formula or iteratively computing eigenvalues to solve a matrix: the optimization problems we’re solving are subproblems, not the original full problem. These crucially differ from most of the examples in the OP in that the system’s objective function (in your sense) does not match the objective function (in the usual intuitive sense). They’re iteratively optimizing a subproblem’s objective, not the “full” problem’s objective.
That’s potentially an issue for thinking about e.g. AI as an optimizer: if it’s using iterative optimization on subproblems, but using those results to perform some higher-level optimization in a non-iterative manner, then aligning the sobproblem-optimizers may not be synonymous with aligning the full AI. Indeed, I think a lot of reasoning works very much like this: we decompose a high-dimensional problem into coupled low-dimensional subproblems (i.e. “gears”), then apply iterative optimizers to the subproblems. That’s exactly how eigenvalue algorithms work, for instance: we decompose the full problem into a series of optimization subproblems in narrower and narrower subspaces, while the “high-level” part of the algorithm (i.e. outside the subproblems) doesn’t look like iterative optimization.
No, the issue is that the usual definition of an optimization problem (e.g. maxx f(x)) has no built-in notion of time, and the intuitive notion of optimization (e.g. “the system makes Y big”) has no built-in notion of time (or at least linear time). It’s this really fundamental thing that isn’t present in the “original problem”, so to speak; it would be very surprising and interesting if time had to be involved when it’s not present from the start.
If I specifically try to brainstorm things-which-look-like-optimization-but-don’t-involve-objective-improvement-over-time, then it’s not hard to come up with examples:
Rather than a function-value “improving” along linear time, I could think about a function value improving along some tree or DAG—e.g. in a heap data structure, we have a tree where the “function value” always “improves” as we move from any leaf toward the root. There, any path from a leaf to the root could be considered “time” (but the whole set of nodes at the “same level” can’t be considered a time-slice, because we don’t have a meaningful way to compare whole sets of values; we could invent one, but it wouldn’t actually reflect the tree structure).
The example from the earlier comment: a one-shot non-iterative optimizer
A distributed optimizer: the system fans out, tests a whole bunch of possible choices in parallel, then selects the best of those.
Various flavors of constraint propagation, e.g. the simplex algorithm (and markets more generally)
I think this is a very important distinction. I prefer to use “maximizer” for “timelessly” finding the highest value of an objective function, and reserve “optimizer” for the kind of stepwise improvement discussed in this post. As I use the terms, to maximize something is to find the state with the highest value, but to optimize it is to take an initial state and find a new state with a higher value. I recognize that “optimize” and “optimizer” are sometimes used the way you’re saying, as basically synonymous with “maximize” / “maximizer”, and I could retreat to calling the inherently temporal thing I’m talking about an “improver” (or an “improvement process” if I don’t want to reify it), but this actually seems less likely to be quickly understood, and I don’t think it’s all that useful for “optimize” and “maximize” to mean exactly the same thing.
(There is a subset of optimizers as I (and this post, although I think the value should be graded rather than binary) use the term that in the limit reach the maximum, and a subset of those that even reach the maximum in a finite number of steps, but optimizers that e.g. get stuck in local maxima aren’t IMO thereby not actually optimizers, even though they aren’t maximizers in any useful sense.)