There’s a theorem that a large enough neural network can approximate any algorithm, but when a neural network learns, we have no control over which algorithms it will end up implementing, and don’t know how to read the algorithms off the numbers.
This theorem buys us a lot less than most people think it does, because of the requirement that the function’s domain be bounded.
LLMs are Turing-complete and can implement any algorithms (researchers even came up with compilers of code into LLM parameters; though we don’t really know how to “decompile” an existing LLM to understand what algorithms the numbers represent).
LLMs I think can’t be Turing-complete, assuming they are based on the transformer architecture (which basically all LLMs are based on), at least if @Lucius Bushnaq is correct about the basic inability to simulate an unbounded tape using an unbounded context window, but RNNs are Turing complete, but there are a lot of contextual issues here.
Quote from Lucius Bushnaq:
Well, transformers are not actually Turing complete in real life where parameters aren’t real numbers, because if you want an unbounded context window to simulate unbounded tape, you eventually run out of space for positional encodings. But the amount of bits they can hold in memory does grow exponentially with the residual stream width, which seems good enough to me. Real computers don’t have infinite memory either.
And on the issues with claiming LLMs are Turing-complete in general:
This seems pretty irrelevant to the points in question. To the extent there’s a way to look at the world, think about it, and take actions to generally achieve your goals, e.g., via the algorithms that humans are running, technically, a large enough neural network can do this in a single forward pass.
We won’t make a neural network that in a single forward pass can iterate through and check all possible proofs of length <10^100 of some conjecture, but it doesn’t mean that we can’t make a generally capable AI system; and we also make CPUs large enough for it—but that doesn’t affect whether computers are Turing-complete in a bunch of relevant ways.
“Any algorithm” being restricted to algorithms that, e.g., have some limited set of variables to operate on, is a technicality expanding on which wouldn’t affect the validity of the points made or implied in the post, so the simplification of saying “any algorithm” is not misleading to a reader who is not familiar with any of this stuff; and it’s in the section marked as worth skipping to people not new to LW, as it is not intended to communicate anything new to people who are not new to LW.
In reality, empirically, we see that fairly small neural networks get pretty good at the important stuff.
Like, “oh, but physical devices can’t run an arbitrarily long tape” is a point completely irrelevant to whether for anything that we can do, LLMs would be able to do this, and to the question of whether AI will end up killing everyone. Humans are not Turing-complete in some narrow sense; this doesn’t prevent us from being generally intelligent.
“Oh but physical devices can’t run an arbitrarily long tape”
This is not the actual issue.
The actual issue is that even with unbounded resources, you still couldn’t simulate an unbounded tape because you can’t get enough space for positional encodings.
Humans are not Turing-complete in some narrow sense;
Note that for the purpose of Turing-completeness, we only need to show that if we gave it unbounded resources, it could solve any computable problem without having to change the code, and we haven’t actually proven that humans aren’t Turing complete (indeed my big guess is that humans are Turing completeness).
If you give a physical computer a large enough tape, or make a human brain large enough without changing its density, it collapses into a black hole. It is really not relevant to any of the points made in the post.
For any set of inputs and outputs to an algorithm, we can make a neural network that approximates with arbitrary precision these inputs and outputs, in a single forward pass, without even having to simulate a tape.
I sincerely ask people to engage with the actual contents of the post related to sharp left turn, goal crystallization, etc. and not with a technicality that doesn’t affect any of the points raised who they’re not an intended audience of.
Some technical comments on this post:
This theorem buys us a lot less than most people think it does, because of the requirement that the function’s domain be bounded.
More here:
https://lifeiscomputation.com/the-truth-about-the-not-so-universal-approximation-theorem/
On LLM Turing-completeness:
LLMs I think can’t be Turing-complete, assuming they are based on the transformer architecture (which basically all LLMs are based on), at least if @Lucius Bushnaq is correct about the basic inability to simulate an unbounded tape using an unbounded context window, but RNNs are Turing complete, but there are a lot of contextual issues here.
Quote from Lucius Bushnaq:
And on the issues with claiming LLMs are Turing-complete in general:
https://lifeiscomputation.com/transformers-are-not-turing-complete/
This seems pretty irrelevant to the points in question. To the extent there’s a way to look at the world, think about it, and take actions to generally achieve your goals, e.g., via the algorithms that humans are running, technically, a large enough neural network can do this in a single forward pass.
We won’t make a neural network that in a single forward pass can iterate through and check all possible proofs of length <10^100 of some conjecture, but it doesn’t mean that we can’t make a generally capable AI system; and we also make CPUs large enough for it—but that doesn’t affect whether computers are Turing-complete in a bunch of relevant ways.
“Any algorithm” being restricted to algorithms that, e.g., have some limited set of variables to operate on, is a technicality expanding on which wouldn’t affect the validity of the points made or implied in the post, so the simplification of saying “any algorithm” is not misleading to a reader who is not familiar with any of this stuff; and it’s in the section marked as worth skipping to people not new to LW, as it is not intended to communicate anything new to people who are not new to LW.
In reality, empirically, we see that fairly small neural networks get pretty good at the important stuff.
Like, “oh, but physical devices can’t run an arbitrarily long tape” is a point completely irrelevant to whether for anything that we can do, LLMs would be able to do this, and to the question of whether AI will end up killing everyone. Humans are not Turing-complete in some narrow sense; this doesn’t prevent us from being generally intelligent.
To clarify a point here:
This is not the actual issue.
The actual issue is that even with unbounded resources, you still couldn’t simulate an unbounded tape because you can’t get enough space for positional encodings.
Note that for the purpose of Turing-completeness, we only need to show that if we gave it unbounded resources, it could solve any computable problem without having to change the code, and we haven’t actually proven that humans aren’t Turing complete (indeed my big guess is that humans are Turing completeness).
If you give a physical computer a large enough tape, or make a human brain large enough without changing its density, it collapses into a black hole. It is really not relevant to any of the points made in the post.
For any set of inputs and outputs to an algorithm, we can make a neural network that approximates with arbitrary precision these inputs and outputs, in a single forward pass, without even having to simulate a tape.
I sincerely ask people to engage with the actual contents of the post related to sharp left turn, goal crystallization, etc. and not with a technicality that doesn’t affect any of the points raised who they’re not an intended audience of.