Will early transformative AIs primarily use text to store intermediate thoughts and memories?
I think the answer to this question matters for alignment research prioritization, and getting some common knowledge about what are the most commonly held views about this question would be helpful.
By early transformative AIs I mean the first publicly announced AI system which can speed up AI capabilities progress by 30x: if cognitive labor done at major AI labs (including hardware manufacturers) was only done by humans assisted with 2020-level-AI, and resources were rearranged to maximize productivity (within AI labs, allowing new hires but within the same financial constraints), then it would take 300 days for AI labs to produce “stronger AIs” it actually created in 10 days, as judged by me.
The operationalization below are inspired by this Metaculus question, and aim at describing AIs weaker than early TAIs.
By an AI able to pass a 2-hour adversarial Turing test, I mean an AI system able to reliably pass a 2-hour, adversarial Turing test during which the participants can send text, images, and audio files (as is done in ordinary text messaging applications) during the course of their conversation. An ‘adversarial’ Turing test is one in which the human judges are AI experts (who understand AI weaknesses) instructed to ask interesting and difficult questions, designed to advantage human participants, and to successfully unmask the computer as an impostor. A demonstration of an AI passing such a Turing test, or one that is sufficiently similar, will be sufficient for this condition, so long as the test is well-designed to the estimation of me. Several tests might be required to indicate reliability.
By an AI achieving 90% accuracy at hard APPS problems, I mean an AI system interacting with APPS like a human would (with access to any tool which doesn’t require access to the internet or to a human) which is able to submit a correct program on the first submission attempt in 90% of interview-level problems found in the APPS benchmark introduced by Dan Hendrycks, Steven Basart et al.
“Primarily using text”
By primarily using text, I mean the conjunction of the following:
Text bottlenecks: there isn’t a path of more than 100k serial operations during the generation of an answer where information doesn’t go through a categorical format where most categories correspond to words or pieces of words, in a way which makes sense to at least some human speakers when read (but it doesn’t have to be faithful). Some additional bits are allowed as long as they fit in a human-understandable color-coding scheme. Only matrix multiplications, convolutions, and similarly “heavy” operations count as serial operations. There are 2 serial operations per classic attention block), and 2 serial operations per classic MLP block, so a forward pass of GPT-3 is around 400 serial operations, but GPT-3 generating 1000 tokens is around 400k operations, so GPT-3 generating 100 tokens would count as having text bottlenecks, but not a 100-layer image diffusion model doing 1000 steps of diffusion.
Long thoughts: The AI generates its answer using at least 1M serial operations. GPT-3 generating 1000 tokens wouldn’t count as “long thoughts”, but GPT-3 generating 10k tokens (within some scaffold) would count.
I will use speculations about AIs architectures to resolve this question. For example, GPT-4 generating 10k tokens would qualify as primarily using text.
(For reference, if it takes 1ms for a human neuron to process the incoming signal and fire, then the human brain can do 100k serial operations in 1’40’’, and 1M serial operations in 16’40’’.)
How the answer to this question affects research prioritization
Some research directions are text-specific, such as research on steganography and chain-of-thought.
Some plans for safe AIs are much more realistic when AIs primarily use text. For example, plans relying on monitoring AI “thoughts are much more likely to succeed if AIs primarily use text.
Some research directions are more relevant if done on the same kind of AIs as the ones used for the first AIs which really matter, such as most research on the structure of activations and research on inductive biases. If the early TAIs are not primarily using text, such research should make sure to get results which transfer across modalities and architectures, or their conclusions might not be relevant.
You can find my takes about the implications, as well as arguments for and against future AIs primarily using text in this related post of mine: The Translucent Thoughts Hypotheses and Their Implications, which is about similar (but narrower) claims.
Will early transformative AIs primarily use text? [Manifold question]
Will early transformative AIs primarily use text to store intermediate thoughts and memories?
I think the answer to this question matters for alignment research prioritization, and getting some common knowledge about what are the most commonly held views about this question would be helpful.
You can make bets on this question here: https://manifold.markets/FabienRoger/will-early-transformative-ais-prima
To make this problematic more concrete, here are two questions which will probably resolve sooner:
Will the first AI to pass a 2-hour adversarial Turing test primarily use text?
Will the first AI to achieve 90% accuracy at interview APPS problems primarily use text?
Operationalization
AI power criteria
By early transformative AIs I mean the first publicly announced AI system which can speed up AI capabilities progress by 30x: if cognitive labor done at major AI labs (including hardware manufacturers) was only done by humans assisted with 2020-level-AI, and resources were rearranged to maximize productivity (within AI labs, allowing new hires but within the same financial constraints), then it would take 300 days for AI labs to produce “stronger AIs” it actually created in 10 days, as judged by me.
The operationalization below are inspired by this Metaculus question, and aim at describing AIs weaker than early TAIs.
By an AI able to pass a 2-hour adversarial Turing test, I mean an AI system able to reliably pass a 2-hour, adversarial Turing test during which the participants can send text, images, and audio files (as is done in ordinary text messaging applications) during the course of their conversation. An ‘adversarial’ Turing test is one in which the human judges are AI experts (who understand AI weaknesses) instructed to ask interesting and difficult questions, designed to advantage human participants, and to successfully unmask the computer as an impostor. A demonstration of an AI passing such a Turing test, or one that is sufficiently similar, will be sufficient for this condition, so long as the test is well-designed to the estimation of me. Several tests might be required to indicate reliability.
By an AI achieving 90% accuracy at hard APPS problems, I mean an AI system interacting with APPS like a human would (with access to any tool which doesn’t require access to the internet or to a human) which is able to submit a correct program on the first submission attempt in 90% of interview-level problems found in the APPS benchmark introduced by Dan Hendrycks, Steven Basart et al.
“Primarily using text”
By primarily using text, I mean the conjunction of the following:
Text bottlenecks: there isn’t a path of more than 100k serial operations during the generation of an answer where information doesn’t go through a categorical format where most categories correspond to words or pieces of words, in a way which makes sense to at least some human speakers when read (but it doesn’t have to be faithful). Some additional bits are allowed as long as they fit in a human-understandable color-coding scheme. Only matrix multiplications, convolutions, and similarly “heavy” operations count as serial operations. There are 2 serial operations per classic attention block), and 2 serial operations per classic MLP block, so a forward pass of GPT-3 is around 400 serial operations, but GPT-3 generating 1000 tokens is around 400k operations, so GPT-3 generating 100 tokens would count as having text bottlenecks, but not a 100-layer image diffusion model doing 1000 steps of diffusion.
Long thoughts: The AI generates its answer using at least 1M serial operations. GPT-3 generating 1000 tokens wouldn’t count as “long thoughts”, but GPT-3 generating 10k tokens (within some scaffold) would count.
I will use speculations about AIs architectures to resolve this question. For example, GPT-4 generating 10k tokens would qualify as primarily using text.
(For reference, if it takes 1ms for a human neuron to process the incoming signal and fire, then the human brain can do 100k serial operations in 1’40’’, and 1M serial operations in 16’40’’.)
How the answer to this question affects research prioritization
Some research directions are text-specific, such as research on steganography and chain-of-thought.
Some plans for safe AIs are much more realistic when AIs primarily use text. For example, plans relying on monitoring AI “thoughts are much more likely to succeed if AIs primarily use text.
Some research directions are more relevant if done on the same kind of AIs as the ones used for the first AIs which really matter, such as most research on the structure of activations and research on inductive biases. If the early TAIs are not primarily using text, such research should make sure to get results which transfer across modalities and architectures, or their conclusions might not be relevant.
You can find my takes about the implications, as well as arguments for and against future AIs primarily using text in this related post of mine: The Translucent Thoughts Hypotheses and Their Implications, which is about similar (but narrower) claims.