Instrumental convergence is what makes general intelligence possible

TL;DR: General intelligence is possible because solving real-world problems requires solving common subtasks. Common subtasks are what give us instrumental convergence. Common subtasks are also what make AI useful; you want AIs to pursue instrumentally convergent goals. Capabilities research proceeds by figuring out algorithms for instrumentally convergent cognition. Consequentialism and search are fairly general ways of solving common subtasks.

General intelligence is possible because solving real-world problems requires solving common subtasks

No-free-lunch theorems assert that any cognitive algorithm is equally successful when averaged over all possible tasks. This might sound strange, so here’s an intuition pump. Suppose you get a test like

  • 2+2 = _

  • 3*2 = _

and so on. One cognitive algorithm would be to evaluate the arithmetic expression and fill the answer in as the result. This algorithm seems so natural that it’s hard to imagine how the no-free-lunch theorem could apply to this; what possible task could ever make arithmetic score poorly on questions like the above?

Easy: While an arithmetic evaluator would score well if you e.g. get 1 point for each expression you evaluate arithmetically, it would score very poorly if you e.g. lose 1 point for each expression you evaluate arithmetically.

This doesn’t matter much in the real world because you are much more likely to encounter situations where it’s useful to do arithmetic right than you are to encounter situations where it’s useful to do arithmetic wrong. No-free-lunch theorems point out that when you average all tasks, useful tasks like “do arithmetic correctly” are perfectly cancelled out by useless tasks like “do arithmetic wrong”; but in reality you don’t average over all conceivable tasks.

If there were no correlations between subtasks, there would be no generally useful algorithms. And if every goal required a unique algorithm, general intelligence would not exist in any meaningful sense; the generally-useful cognitions are what constitutes general intelligence.

Common subtasks are what give us instrumental convergence

Instrumental convergence basically reduces to acquiring and maintaining power (when including resources under the definition of power). And this is an instance of common subtasks: lots of strategies require power, so a step in lots of strategies is to accumulate or preserve power. Therefore, just about any highly capable cognitive system is going to be good at getting power.

“Common subtasks” views instrumental convergence somewhat more generally than is usually emphasized. For instance, instrumental convergence is not just about goals, but also about cognitive algorithms. Convolutions and big matrix multiplications seem like a common subtask, so they can be considered instrumentally convergent in a more general sense. I don’t think this is a major shift from how it’s usually thought of; computation and intelligence are usually considered as instrumentally convergent goals, so why not algorithms too?

Common subtasks are also what make AI useful; you want AIs to pursue instrumentally convergent goals

The logic is simple enough: if you have an algorithm that solves a one-off task, then it is at most going to be useful once. Meanwhile, if you have an algorithm that solves a common task, then that algorithm is commonly useful. An algorithm that can classify images is useful; an algorithm that can classify a single image is not.

This applies even to power-seeking. One instance of power-seeking would be earning money; indeed an AI that can autonomously earn money sounds a lot more useful than one that cannot. It even applies to “dark” power-seeking, like social manipulation. For instance, I bet the Chinese police state would really like an AI that can dissolve rebellious social networks.

The problem is not that we don’t know how to prevent power-seeking or instrumental convergence, because we want power-seeking and instrumental convergence. The problem is that we don’t know how to align this power-seeking, how to direct the power towards what we want, rather than having side-effects that we don’t want.

Capabilities research proceeds by figuring out algorithms for instrumentally convergent cognition

Instrumentally convergent subgoals are actually fairly nontrivial. “Acquire resources” isn’t a primitive action, it needs a lot of supporting cognition. The core of intelligence isn’t “simple” per se; rather it is complex algorithms distilled from experience (or evolution) against common tasks. A form of innate wisdom, if you will.

In principle it might seem simple; we have basic theorems showing that ideal agency looks somewhat like or something roughly like that. The trouble is that this includes an intractable maximum and an intractable expected value. Thus we need to break it down into tractable subproblems; these subproblems exploit lots of detail about the structure of reality, and so they are themselves highly detailed.

The goal of capabilities research is basically to come up with algorithms that do well on commonly recurring subproblems. 2D CNNs are commonly useful due to the way light interacts with the world. Self-supervised learning from giant scrapes of the internet is useful because the internet scrapes are highly correlated with the rest of reality. Imitation learning is useful due to Aumann’s Agreement Theorem and because instrumental convergence also applies to human intelligence. And so on.

Maybe we find a way to skip past all the heuristics and unleash a fully general learner that can independently figure out all the tricks, without needing human capabilities researchers to help further. This is not a contradiction to common subtasks being what drives general intelligence, since “figure out generally useful tricks” seems like a generally useful subtask to be able to solve. However, the key point is that even if there is no efficient “simple core of intelligence”, the “common tasks” perspective still gives a reason why capabilities research would discover instrumentally convergent general intelligence, through accumulating tons of little tricks.

Consequentialism and search are fairly general ways of solving common subtasks

Reality seems to have lots of little subproblems that you can observe, model, analyze, and search for solutions to. This is basically what consequentialism is about. It gives you a very general way of solving problems, as long as you have sufficiently accurate models. There are good reasons to expect consequentialism to be pervasive.

AI researchers are working on implementing general consequentialist algorithms, e.g. in the reinforcement learning framework. So far, the search method their algorithms use are often of the naive “try lots of things and do what seems to work” form, but this is not the only form of search that exists. Efficient general-purposes search instead tends to involve reasoning about abstract constraints rather than particular policies. Because search and consequentialism are so commonly useful, we have lots of reason to expect it to exist in general intelligences.

Thanks to Justis Mills for proofreading and feedback.