I should stress that this criterion doesn’t actually matter for AI x-risk, because you can always reframe the risks in terms of Y, and not mention X at all. However, that might cost you more ink.
ME, a visionary: GPT-4 is misaligned because it’s simulating deceptive agents.
YOU, a fool: GPT-4 isn’t simulating any agents, it’s just predicting which tokens continue a prompt.
ME, a correct-opinion-haver: Fine, whatever… GPT-4 is misaligned because it predicts the tokens continuing a prompt by applying a function parameterised in a high-dimensional space to minimise cross-entropy loss across the internet corpus and the internet corpus contains a lot of conversations where one character deceives another and therefore GPT-4 will respond in the same way that a deceptive character would do so.
The X-Y Criterion
Informal statement
Okay, here’s the X-Y Criterion:
If two tasks reduce to one another, then it is meaningless to ask if a machine is ‘really doing’ one task versus the other.
Don’t worry, later in the article we’ll formalise what “task”, “reduce”, and “doing” means.
First draft — computational reduction
Our first draft will be “computational reduction”.
A task X is about processing classical information, i.e. X:{0,1}∗→{0,1}∗.
An algorithm A achieves a particular task X if it processes classical information in that way.
In order to achieve a task X, the algorithm A expends certain quantities of computational resources, e.g. time, memory, samples, bandwidth, etc. These resources are abstract and non-physical.
A task X reduces to task Y if and only if...
For every algorithm A that solves task Y, there exists another algorithm B such that...
(1) B solves task X by interacting with A. (2) The combined algorithm (A⊗B) doesn’t expend much more computational resources to solve X as A expends to solve Y.
X-Y Criterion: If two tasks X and Y reduce to one another, then it is meaningless to ask if an algorithm A is ‘really doing’ one task versus the other.
This is what computer scientists mean when they say that one problem “reduces” to another task, e.g. when they say that all NP problems reduce to 3SAT.
Second draft — physical reduction
The second-draft formalisation will be “physical reduction”.
A task X is about changing the state of the world, i.e. X:Ω→Ω.
A machine A achieves a particular task X if it changes the state of the world in that way.
In order to achieve a task X, the machine A expends certain quantities of physical resources, e.g. time, energy, hardware, money, negentropy, bandwidth, etc. These resources are physical and non-abstract.
A task X reduces to task Y if and only if...
For every machine A that solves task Y, there exists another machine B such that...
(1) B solves task X by interacting with A. (2) The combined machine (A⊗B) doesn’t expend much more physical resources to solve X as A expends to solve Y.
X-Y Criterion: If two tasks X and Y reduce to one another, then it is meaningless to ask if a machine A is ‘really doing’ one task versus the other.
I prefer this second formalisation, in terms of physical processes and physical resources.
Firstly, it’s more relevant — I don’t care about expending fewer computational resources, I care about expending fewer physical resources.
Secondly, it’s more general — we can talk about tasks which aren’t a manipulation of classical information.
Defending the X-Y criterion.
Intuitions from computer science
People develop this intuition after studying theoretical computer science for a long time.
In computer science, when we say that a particular algorithm “does task X”, what we mean is that someone could achieve X by using that algorithm.
For example, when we say “this calculator adds numbers”, what we mean is that someone who wanted to add numbers could use the calculator to do that. They could also use the calculator to do a bunch of other things, like knocking someone over the head.
Without the X-Y Criterion, “computation” doesn’t even mean anything.
Like, seriously? What do you mean when you say Google Maps “finds the shortest route from your house to the pub”? Your phone is just displaying certain pixels, it doesn’t output an actual physical road! So what do you mean? What you mean is that, by using Google Maps as an oracle with very little overhead, you can find the shortest route from your house to the pub.
All computation is emergent
Let’s look more closely at what’s going on.
Much more closely.
Your phone starts in a particular configuration of bosons and fermions, and after a few seconds, your phone is in a different configuration of bosons and fermions. Meanwhile, it has converted some electrical potential energy in the lithium battery into the kinetic energy of nearby air molecules.
If you looked at the fermions and bosons really closely, you might be able to see that there’s a magnetic strip which alternates between north-pointing (1) and south-pointing (0). Using a (north=1, south=0)-encoding, you can say that the phone is performing certain bit-arithmetic operations.
But that’s about as far as you can go by only appealing to the internal physical state of the machine. This is because the same bit-string can represent many different objects. The same bit string (e.g. 0101000101010) might represent an integer, an ASCII character, a word, an event, a datetime, a neural network, or whatever. In fact, by changing the datatypes, this bit-string can represent almost anything. The internal physical state of the phone doesn’t break the symmetry between these datatypes.
Okay, so at the level of fundamental reality, it looks like there’s no computational whatsoever. Just fermions and bosons wiggling around. Maybe we can talk about bitstring manipulations as well, but that’s about it.
That means that once we admit the machine is doing matrix multiplication — then we’re already in the realm of higher-level emergent behaviour. We’re already appealing to the X-Y Criterion. And it’s a slippery slope from matrix multiplication to everything else.
Look at the following examples of the X-Y pattern —
The machine isn’t really multiplying matrices, it’s just changing voltages in transistors.
The machine isn’t really predicting pixels, it’s just multiplying matrices.
The machine isn’t really learning that dogs have fur, it’s just predicting pixels.
The machine isn’t really inferring causation, it’s just learning that dogs have fur.
To me, the only example which is somewhat defensible is (1), because there is a meaningful sense in which the machine is changing transistor voltages but not multiplying matrices. Namely, in the sense of physical manipulation of the internal fermions and bosons.
But the rest of the examples are indefensible. There’s no meaningful sense in which the machine is doing Y but not doing X. Once we go beyond fermionic-bosonic manipulation, it’s all emergent capabilities.
Yep, that includes you.
Appendix
Secret third draft — quantum computational-physical reduction
These two formalisations (computational and physical) seem orthogonal, but once you’re sufficiently it-from-qubit-pilled, you’ll recognise that they are entirely identical.
I won’t go into the details here, but...
Amend the computational formalisation, by changing “classical information” into “quantum information”.
Amend the physical formalisation, by changing “state” to “quantum state”.
Then (by the Extended Quantum Church-Turing Thesis) these two formalisations are entirely identical. This trick works because “quantum physics = quantum computation” in a much stronger sense than “classical physics = classical computation”. But anyway, that’s outside the scope today.
Hansonian sacredness
In We See The Sacred From Afar, To See It The Same, Robin Hanson gives a pretty comprehensive list of 51 beliefs, attitudes, and behaviours that seem to correlate with things called “sacred”.
I notice another correlation — if something is sacred, then it’s more likely to be the X in the general pattern “the algorithm isn’t doing X, it’s just doing Y”.
For example, people have been far more incredulous that a machine can write music than that a machine can write JavaScript. This seems to be mostly motivated by the “sacredness” of music relative to JavaScript.
The algorithm isn’t doing X, it’s just doing Y.
Introduction
Mutual reduction implies equivalence
Here’s my most load-bearing intuition —
Moreover —
This intuition grounds my perspective on intelligence, AI, alignment, philosophy, etc.
This intuition is load-bearing for other people who share my views.
This intuition is a crux for much of the disagreement we have with other people.
In this article, I’ll formalise this intuition in two ways, computational and physical.
Motivation
People often say “the algorithm isn’t doing X, it’s just doing Y”.
X is normally some impressive high-level human-y thing, such as
writing poetry
causal reasoning
recognising emotions
interpreting art
writing music
making ethical decisions
planning actions
telling jokes
understanding concepts
simulating agents, etc.
Y is normally some unimpressive low-level computery thing, such as
predicting tokens
sampling from a distribution
querying a lookup table
multiplying matrices
sorting numbers
clustering data points
compressing text
searching a tree
manipulating bitstrings
polarising magnetic strips, etc.
Rather than address each example individually, I think it’ll be more efficient to construct a general criterion by which we can assess each example.
Click here for the specific example of LLMs.
This criterion doesn’t actually matter
I should stress that this criterion doesn’t actually matter for AI x-risk, because you can always reframe the risks in terms of Y, and not mention X at all. However, that might cost you more ink.
The X-Y Criterion
Informal statement
Okay, here’s the X-Y Criterion:
Don’t worry, later in the article we’ll formalise what “task”, “reduce”, and “doing” means.
First draft — computational reduction
Our first draft will be “computational reduction”.
This is what computer scientists mean when they say that one problem “reduces” to another task, e.g. when they say that all NP problems reduce to 3SAT.
Second draft — physical reduction
The second-draft formalisation will be “physical reduction”.
I prefer this second formalisation, in terms of physical processes and physical resources.
Firstly, it’s more relevant — I don’t care about expending fewer computational resources, I care about expending fewer physical resources.
Secondly, it’s more general — we can talk about tasks which aren’t a manipulation of classical information.
Defending the X-Y criterion.
Intuitions from computer science
People develop this intuition after studying theoretical computer science for a long time.
In computer science, when we say that a particular algorithm “does task X”, what we mean is that someone could achieve X by using that algorithm.
For example, when we say “this calculator adds numbers”, what we mean is that someone who wanted to add numbers could use the calculator to do that. They could also use the calculator to do a bunch of other things, like knocking someone over the head.
Without the X-Y Criterion, “computation” doesn’t even mean anything.
Like, seriously? What do you mean when you say Google Maps “finds the shortest route from your house to the pub”? Your phone is just displaying certain pixels, it doesn’t output an actual physical road! So what do you mean? What you mean is that, by using Google Maps as an oracle with very little overhead, you can find the shortest route from your house to the pub.
All computation is emergent
Let’s look more closely at what’s going on.
Much more closely.
Your phone starts in a particular configuration of bosons and fermions, and after a few seconds, your phone is in a different configuration of bosons and fermions. Meanwhile, it has converted some electrical potential energy in the lithium battery into the kinetic energy of nearby air molecules.
If you looked at the fermions and bosons really closely, you might be able to see that there’s a magnetic strip which alternates between north-pointing (1) and south-pointing (0). Using a (north=1, south=0)-encoding, you can say that the phone is performing certain bit-arithmetic operations.
But that’s about as far as you can go by only appealing to the internal physical state of the machine. This is because the same bit-string can represent many different objects. The same bit string (e.g. 0101000101010) might represent an integer, an ASCII character, a word, an event, a datetime, a neural network, or whatever. In fact, by changing the datatypes, this bit-string can represent almost anything. The internal physical state of the phone doesn’t break the symmetry between these datatypes.
Okay, so at the level of fundamental reality, it looks like there’s no computational whatsoever. Just fermions and bosons wiggling around. Maybe we can talk about bitstring manipulations as well, but that’s about it.
That means that once we admit the machine is doing matrix multiplication — then we’re already in the realm of higher-level emergent behaviour. We’re already appealing to the X-Y Criterion. And it’s a slippery slope from matrix multiplication to everything else.
Look at the following examples of the X-Y pattern —
The machine isn’t really multiplying matrices, it’s just changing voltages in transistors.
The machine isn’t really predicting pixels, it’s just multiplying matrices.
The machine isn’t really learning that dogs have fur, it’s just predicting pixels.
The machine isn’t really inferring causation, it’s just learning that dogs have fur.
To me, the only example which is somewhat defensible is (1), because there is a meaningful sense in which the machine is changing transistor voltages but not multiplying matrices. Namely, in the sense of physical manipulation of the internal fermions and bosons.
But the rest of the examples are indefensible. There’s no meaningful sense in which the machine is doing Y but not doing X. Once we go beyond fermionic-bosonic manipulation, it’s all emergent capabilities.
Yep, that includes you.
Appendix
Secret third draft — quantum computational-physical reduction
These two formalisations (computational and physical) seem orthogonal, but once you’re sufficiently it-from-qubit-pilled, you’ll recognise that they are entirely identical.
I won’t go into the details here, but...
Amend the computational formalisation, by changing “classical information” into “quantum information”.
Amend the physical formalisation, by changing “state” to “quantum state”.
Then (by the Extended Quantum Church-Turing Thesis) these two formalisations are entirely identical. This trick works because “quantum physics = quantum computation” in a much stronger sense than “classical physics = classical computation”. But anyway, that’s outside the scope today.
Hansonian sacredness
In We See The Sacred From Afar, To See It The Same, Robin Hanson gives a pretty comprehensive list of 51 beliefs, attitudes, and behaviours that seem to correlate with things called “sacred”.
I notice another correlation — if something is sacred, then it’s more likely to be the X in the general pattern “the algorithm isn’t doing X, it’s just doing Y”.
For example, people have been far more incredulous that a machine can write music than that a machine can write JavaScript. This seems to be mostly motivated by the “sacredness” of music relative to JavaScript.