A Turing machine is a finite automaton that has access to sufficient space for notes. A Turing machine with a very small finite automaton can simulate an arbitrary program if the program is already written down in the notes. A Turing machine with a large finite automaton can simulate a large program out of the box. ML models can obviously act like finite automata. So they are all Turing complete, if given access to enough space for making notes, possibly with initialization notes containing a large program.
This is not at all helpful, because normal training won’t produce interesting finite automata, not unless it learns from appropriate data, which is only straightforward to generate if the target finite automaton is already known. Also, even short term human memory already acts like ML models and not deliberative examination of written notes, so an LLM-based agent would need to reason in an unusual and roundabout way if it doesn’t have a better architecture that continually learns from observations (and thus makes external notes unnecessary). Internal monologue is still necessary to produce complicated conclusions, but that could just be normal output wrapped in silencing tags.
I’m not sure how obvious it is that “ML models can act like finite automata”. I mean, there are theorems that say things like “a large enough multi-layer perceptron can approximate any function arbitrarily well”, and unless I’m being dim those do indeed indicate that for such a model there exist weights that make it implement a universal Turing machine, but I don’t think that means that e.g. such weights exist that make a transformer of “reasonable” size do that. (Though, on reflection, I think I agree that we should expect that they do.) Your comment about normal training not doing that was rather the point of my final question.
Right, I don’t know how much data a model stores, and how much of that can be reached through retraining, if all parameters can’t be specified outright. If the translation is bad enough it couldn’t quote an LLM and memorize its parameters as explicitly accessible raw data using a model of comparable size. Still, an LLM trained on actual language could probably get quite a lot smaller after some lossy compression (that I have no idea how to specify), and it would also take eons to decode from the model (by doing experiments on it to elicit its behavior). So size bounds are not the most practical concern here. But maybe the memorized data could be written down much faster with a reasonable increase in model size?
A Turing machine is a finite automaton that has access to sufficient space for notes. A Turing machine with a very small finite automaton can simulate an arbitrary program if the program is already written down in the notes. A Turing machine with a large finite automaton can simulate a large program out of the box. ML models can obviously act like finite automata. So they are all Turing complete, if given access to enough space for making notes, possibly with initialization notes containing a large program.
This is not at all helpful, because normal training won’t produce interesting finite automata, not unless it learns from appropriate data, which is only straightforward to generate if the target finite automaton is already known. Also, even short term human memory already acts like ML models and not deliberative examination of written notes, so an LLM-based agent would need to reason in an unusual and roundabout way if it doesn’t have a better architecture that continually learns from observations (and thus makes external notes unnecessary). Internal monologue is still necessary to produce complicated conclusions, but that could just be normal output wrapped in silencing tags.
I’m not sure how obvious it is that “ML models can act like finite automata”. I mean, there are theorems that say things like “a large enough multi-layer perceptron can approximate any function arbitrarily well”, and unless I’m being dim those do indeed indicate that for such a model there exist weights that make it implement a universal Turing machine, but I don’t think that means that e.g. such weights exist that make a transformer of “reasonable” size do that. (Though, on reflection, I think I agree that we should expect that they do.) Your comment about normal training not doing that was rather the point of my final question.
Right, I don’t know how much data a model stores, and how much of that can be reached through retraining, if all parameters can’t be specified outright. If the translation is bad enough it couldn’t quote an LLM and memorize its parameters as explicitly accessible raw data using a model of comparable size. Still, an LLM trained on actual language could probably get quite a lot smaller after some lossy compression (that I have no idea how to specify), and it would also take eons to decode from the model (by doing experiments on it to elicit its behavior). So size bounds are not the most practical concern here. But maybe the memorized data could be written down much faster with a reasonable increase in model size?