“no matter how fancy a training method we use, some plausible architectures will not be able to do this”, and that seemed worth making explicit.
Fair enough. I’ll try and add a fragment to the post making this argument (at a high level of generality, I’m too ignorant about LLM architecture details to describe such limitations in concrete terms).
(I worry a little that a definition of “language model” much less restrictive than that may end up including literally everything capable of using language, including us and hypothetical AGIs specifically designed to be AGIs.)
I’m using “language model” here to refer to systems optimised solely for the task of predicting text.
Fair enough. I’ll try and add a fragment to the post making this argument (at a high level of generality, I’m too ignorant about LLM architecture details to describe such limitations in concrete terms).
I’m using “language model” here to refer to systems optimised solely for the task of predicting text.