[SEQ RERUN] Optimization
Today’s post, Optimization was originally published on 13 September 2008. A summary (taken from the LW wiki):
A discussion of the concept of optimization.
Discuss the post here (rather than in the comments to the original post).
This post is part of the Rerunning the Sequences series, where we’ll be going through Eliezer Yudkowsky’s old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Psychic Powers, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.
Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day’s sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.
This concept of optimization seems to me difficult or impossible to formalize in a useful manner. I see it as a sometimes-useful thought exercise at best, to be replaced by something more quantitative as soon as possible.
How does this measure of optimization power compare the abilities of the engineers who designed the Corolla, vs the oil refinery that produced its fuel? I’m not asking about the engineers who designed the refinery, just the finished refinery itself.
In general, I think there is something missing about the complexity of the optimization power. Eliezer speaks of powerful optimizers being scary things, but I think he is drawing unwarranted conclusions based on an unvoiced assumption that powerful optimizers are intelligent problem solvers. I see nothing scary about a refrigeration system capable of meaningfully cooling Antarctica, despite it being a powerful entropy-reducer.
More generally: I think that a strong understanding of what constitutes optimization is necessary but not sufficient to a discussion of global impacts of AI. Furthermore, I think that this article constitutes a non-technical explanation of what Eliezer is calling an optimization process. If the goal is simply to replace the term “intelligence” with something that non-experts will be less likely to quibble over, then I think the article has partially succeeded. If the goal is to make productive use of the concept among people who weren’t getting distracted by definitions of intelligence, then I think a technical explanation is in order.
An oil refinery is slightly intelligent as an optimization process. Presumably it has sensors that can detect the heaviness of the crude input, its sulfur content, and perhaps other impurities and process it accordingly so that valuable petroleum fractions are maximized. But I don’t know that much about refineries; it may be the case that human operators madly turn valves and pull levers while watching spinning dials to make our gasoline and the refinery is just a big dumb tool.
The question to ask about the refinery as an optimization process is to what degree its initial conditions can be changed and it still produces optimized output. The more powerful it is as an optimizer, the more degrees of freedom on the input it can handle. A perfect refinery could accept waste from a landfill and always produce highly valuable goods.
If you’re asking about future possible states, and literally counting them as Eliezer suggests, even a refinery that has very little input flexibility is a powerful optimizer, provided it puts out well-refined products. It’s a more powerful optimizer than most people, by that metric. How do I know? It has influence over a lot more total mass. The total entropy reduction it performs is vast.
Why does it matter how humans are involved? That’s one of the positive aspects of Eliezer’s definition: it doesn’t care about whether there are humans pulling levers, humans and a lot of automation and sensors, or a fully automated lights-out computerized operation.
The difference in entropy reduction produced doesn’t care about what the refinery could handle, only what it does handle. Switching between different kinds of crude of similar quality won’t change the optimization power rating, even if lots of adjustments get made when that happens. Worse crude, with more weird stuff in it, probably has a marginally higher starting entropy, so using that as feedstock will produce a higher rating. Being able to use it won’t, though.
Using that definition very few things that humans do count as optimization, and virtually nothing our brains do counts. For instance my brain doesn’t till a field and plant seeds and harvest grain; my muscles twitch to make my bones and skin exert forces on a plow and seeds and grains, and the sun and plants do the vast majority of the work. My brain just happens to be the thing that sets the initial process in motion and tweaks it along the way.
Of course maybe since this was just a preliminary article that’s fine, and a study of “intelligence” would focus on the parts of the optimization process that, if removed, would stop it from being an optimization process. I still think, however, that you can’t just take the most favorable starting state and evaluate the results after some period of time, e.g. an unmanned oil refinery with a pipe connected to an infinite reserve of uniform crude oil on one side, a wire with an infinite reserve of electrical potential on another side, and an output connected to an infinitely empty tank for refined petroleum. You have to look at the likely starting states and calculate a probability distribution over the possible final states. In that sense an oil refinery might optimize over a period of a day or so, given current conditions. But current conditions are far from likely; there are trucks (or rail cars, or pipeline contents) on their way to the refinery for tomorrow, and more transport scheduled to empty its product tanks. The refinery, as a single object, is a very limited optimizing process. If placed on any random flat stable area on Earth’s surface it would be nearly useless. Infrastructure matters.
There’s an anthropic or self-referential kind of argument for most things having some optimizing value, however. The likelihood of finding a refinery randomly placed on the Earth or in the Universe is incredibly low, probably lower than the probability of it having zero utility at a random place in the Universe. The more complex an optimizing process is, the more likely it is to be found where it can be most optimal. Something to think about is the conditional probability P(this-process-is-optimizing | this-process-exists). I suspect it’s very close to 1 for very optimal processes. It’s basically the probability of evolution happening, I guess.
Your brain is a causal component of the optimization processes; therefore it seems fair to give it credit. If I take away your plow, it seems reasonable to conclude that your optimization would be less effective, but not ineffective. If I take away your brain, it seems reasonable to conclude that the plow would lose all optimization power. It seems reasonable to conclude from this that your brain has more optimization power, even within that limited context, than the plow does. Sorting out optimization power of your brain vs the plants is difficult for the same reasons that sorting out causality is difficult.
Your point that the definition of “count states” is awkward is exactly the point I’ve been trying to make. Counting states is precisely entropy, which directly implies that refrigerators and oil refineries are powerful optimizers. This conclusion seems problematic, in that it does not align well with all the connotations of “optimization process” that Eliezer is talking about. That’s why I’m saying we need a technical explanation of optimization power, not a loose qualitative explanation.
It seems to me that optimization power should somehow be measured against the complexity of the problem domain. How, I don’t know. I’m just trying to point out that the original post is farther from a complete treatment on the topic than I thought the first time I read it, or than most of the comments seem to give it credit for.
I’m probably better at concrete examples. Consider a list of N comparable items. An optimizing process that orders them from least to greatest (sorts them) preserving relative order has optimization measure 1/N!. A less optimal process by the measure doesn’t maintain relative order and at worst has optimization measure N!/N! (completely reordering a list of N identical items) and at best (N-M)!/N! where M is the number of unique values among the items.
Sorting maintaining order is optimal under the measure. Sorting ignoring order is variable under the measure, depending entirely on the input. I think this where we were getting confused about how to talk about an oil refinery. Clearly an oil refinery is not optimal, so it must be variable. But then what is its measure? Is it the expected value over all possible initial states? Even that may not be unique; consider a list of N items that can take a value from the set {1} versus a list of N items that can take a value from the set {0,1}. Clearly the former is completely unoptimized by the order-ignoring sort, and the latter has expected optimization value (N/2)!/N!. Taking the value of the N items from the natural numbers the process would achieve 1/N! expected optimization. So taking into consideration the complexity of the problem domain won’t help; {1} is simpler than {0,1} is simpler than the natural numbers, but the optimization is inversely related to this progression.
I think this means that optimization is very context dependent. I think it means if I had to send a minimum-length message about the sorting algorithm I would be able to send a shorter message predicting the un-optimal output of sorting a list of N items of value “1” than a message predicting the optimal output of sorting a list of N natural numbers. If I had to gauge the power of the sorting mind I would get different answers depending on the input despite the fact that all sorting algorithms are O(n log n). If I had to gauge the efficiency of sorting any arbitrary output I could clearly say that the order-preserving sort was optimal, but if we simply change the problem domain so that we can no longer recognize relative order of equal values in the list then both algorithms are less optimal since the maximum measure of optimization drops to 1/(N-M)! So it appears that the same physical processes (or algorithms) can be more or less optimal depending on the preferences themselves. This means it may be impossible to infer the actual preferences from the evidence.
I liked this construction and now I want to see how well I can describe the power and wants of my college friends in order to make accurate predictions about future events and optimization. Because if I can’t, I should work a little harder at staying in touch.
Bonus problem: do you know your own preference ordering in the sense above?
“This process prefers to exactly follow the laws of physics, therefore future events and observations will turn out exactly as a natural physical system would evolve” seems to be a minimal message length description for predicting the behavior of any process, unless that process necessarily restricts its final output to a very tiny domain that’s shorter to describe than the initial state. For any description of the current measurable state of a process and a predicted future description of state likelihoods it seems that it will always be simpler to describe the current state and then predict it will exactly follow the natural laws.
Maybe it works if the likelihood of a given prediction is compared with its length? It’s easy to be trivially correct, but not so easy to make complex predictions that are also right. Take the probability of any message of that length being correct and compare that with P(E|M), the probability that the event E described by message M occurs. If P(E|M) is higher than the probability of a random M-length message being correct it means M is a good description, but I am not convinced it’s a good description solely because of the properties of E. It still seems like M=”laws of physics” will be better than other descriptions of optimizing processes. I am probably missing something.
It is frequently difficult to describe the current state, including the entire optimizer, in complete detail. Often times the behavior of systems including optimizers will be chaotic, in that arbitrarily small changes (especially to the optimizer itself) will result in large changes in the output. In such cases, a less-precise description of the process is more useful, and an optimization based description may be able to constrain the output state much more accurately than “laws of physics” applied over a broad range of input states and optimizer states.
It takes a very, very long message to describe the state of every quark in the human brain.
There are still short, predictions like “matter and energy and momentum will be conserved, and entropy will not decrease.” that are (vacuously) true and accurate. It may take a much longer message to describe the salient properties of the final state of an optimizing process, and it will be less precise than the former prediction. For simple optimizers, like a pump, the message could just describe the overall changes in gas volume, pressure, and temperature. For an optimizer that tiles the universe with highly complex information it may be incredibly difficult to compose a message describing that, especially if the information is encoded very efficiently. That’s why I was thinking about comparing the correctness of the message to its length; a minimal length terabit message that is only 90% accurate (or even 10%) is probably way more informative than a kilobit message that’s 100% correct.