Exorcizing the Speed Prior?
What follows is an argument against finding optimization daemons with too much probability mass in the speed prior. The argument is very fake, but perhaps it could inspire real math down the line. The intuition isn’t so different from the one in are minimal circuits daemon free? -- mostly, the speed prior seems like a harder thing to argue is daemon-free.
The speed prior is very similar to a K-complexity prior, except where a K-complexity prior assigns probability proportional to to a program, the speed prior adds an extra penalty for execution time: . Note that the penalty for time is logarithmic compared to the penalty for length; thinking twice as long is penalized the same as being one bit longer.
Optimization daemons arise when we are searching a rich space to try and solve a problem which we don’t otherwise know how to solve. This makes it seem like the worst-case scenario for daemons would be if intelligence is really fundamentally about search—then “try to be intelligent and daemon-free” is a non-starter.
(There are more nuanced possibilities, like “intelligence is about search, but not necessarily search of “rich spaces” which contain daemons. We’ll set those aside for this argument.)
So, let’s assume that we’re in this worst-case scenario. I’m going to make a contrarian argument that this is actually a good thing.
To be more precise: the concern with optimization daemons is that by optimizing hard over a rich search space to achieve a difficult task, you might accidentally pick a intelligent agent out of the search space, who has goals which are somewhat different from yours. I’m going to assume that the only way an agent can be an intelligent is either by performing an expensive brute-force search over possibilities, or having some baked-in knowledge which allows a smarter search, saving it time exponential in how much baked-in knowledge it has.
The assumption makes a little bit of sense, because linearly many bits can locate something in an exponentially large search space. If I knew more complexity theory, maybe I could say why the assumption is really bad. But, let’s proceed with the assumption.
The argument is: an optimization daemon which solves the problem via intelligent search can’t be higher probability mass than the daemon-free solution. Brute-force search gets you a time penalty which is exactly the same as the description length of the thing you would find out by searching, so there’s no reason to search rather than just have the answer. On the other hand, if the daemon is doing some kind of smart search, my (possibly crazy) complexity-theoretic assumption says that the description complexity of the smart search trades off exactly with what it saves you in search time.
The idea is that we ourselves are doing intelligent search at the “top level” to solve the problem, but, a solution exists which doesn’t itself use intelligent search—namely, the thing an intelligent agent would be finding. An intelligent agent has probability mass at most equal to the non-intelligent solution.
(“Optimization daemons are at most equally likely” may be a rather sad conclusion, but what are you going to do?)
Now, obviously, many interesting poly-time algorithms exist which can optimize in interesting cases and which may look somewhat agentic. The (very fake!) assumption I’m using here is that none of these can be intelligent in a concerning sense. In this view, humans can behave very intelligently in large part because we don’t think via brute-force search—we use “smart search” which has been given a large number of bits of compressed advice in the genetic code. Essentially, we get the advantage of the processing power of evolutionary history.
Another reason to doubt my assumptions is that, intuitively, it seems like there are a lot of problems which actually require intelligence to solve. If you are trying to train a massive neural network in a very complex setting, it seems plausible that the setting can be complex enough to make you learn a general consequentialist reasoner (rather than a big compressed policy which does well).
Perhaps you doubt the premises of my argument but buy the connection between premises and conclusion. If something similar to my argument does make any sense, this means that “intelligence only comes from search” isn’t the worst scenario for daemons. The worst-case scenario is the case where there are types of intelligence which don’t just look like search, but we don’t know what all of them are; this means that the optimization daemons can be smarter than the search algorithm in a significant sense, creating pressure for our search to find those intelligent agents in order to solve problems more efficiently.
Consider a system of linear equations over F2. Brute force search takes time exponential in the dimension. Gaussian elimination takes time polynomial in the dimension, and its description length is O(1). So your hypothesis clearly doesn’t work here. Now, it sounds more plausible if you assume your search problem is NP-hard. The question is whether this is a good model of intelligence. If this is a good model then it means that any intelligent agent will have enormous description complexity, and there is no better way of constructing one than doing brute force search. This probably implies a very long AGI timeline. However, I think it’s more likely that this is a bad model.
This is exactly what seems most likely to me (see this post).
I think the assumption that all intelligence is search is implausible. For one, current AI systems use methods other than brute force search (e.g. Bayesian inference, gradient descent, MCTS, logic, Q-learning). For another, it seems that there are useful generic cognitive algorithms (e.g. multi-level models, mathematical reasoning, various Bayesian models, various frequentist methods such as confidence intervals) that generalize pretty well across different domains, and are not entirely made of search.
Current AI does stochastic search, but it is still search. Essentially PP complexity class, instead of NP/P. (with a fair amount of domain specific heuristics)
Not all AI is search, and when it is it’s usually not brute force search.
I understand your search based intelligence as a simple brute force search over all the possibilities, except that the agent already knows a bitstring from the start of the solution, so only has to consider possibilities starting with those bits. This algorithm, once given its utility function, can’t make gains by trading off time and space.
The hypothesis that humans have a large enough amount of advice built into their genetic code is almost certainly wrong. The human genome contains about 4 billion bits. Most of these bits are used to describe low level biology, not brain function. Its also unclear how evolution got access to advice bits fro spacecraft design. But even ignoring those points, imagine a 20GB pile of data produced by humans. While not perfectly optimized, the data will almost certainly be more than 50% optimized. (If you set every other letter at random, there is literally no way to fill in the remaining letters to make a good book.) If you doubt this, use a bigger pile of data, and a lower threshold. Take a 1Tb pile of text, if you set 99% of a books characters at random, there is no way to fill in the remaining 1% to make sense. This shows that humans can output more than 10Gb of optimization pressure. Being generous and counting each neuron firing, the runtime of humanity is around 10^36. This would put a hard limit of 4 billion +log_2(10^36) bits of optimization pressure that humanity could exert.
The hypothesis that humans smart search is based on our genetic code containing advice in this sense is clearly false.
You could think of the ‘advice’ given by evolution being in the form of a short program, e.g. for a neural-net-like learning algorithm. In this case, a relatively short string of advice could result in a lot of apparent optimization.
(For the book example: imagine a species that outputs books of 20Gb containing only the letter ‘a’. This is very unlikely to be produced by random choice, yet it can be specified with only a few bits of ‘advice’)