Sam Altman once mentioned a test: Don’t train an LLM (or other AI system) on any text about consciousness and see if the system will still report having inner experiences unprompted. I would predict a normal LLM would not. At least if we are careful to remove all implied consciousness, which excludes most texts by humans. But if we have a system that can interact with some environment, have some hidden state, observe some of its own hidden state, and can maybe interact with other such systems (or maybe humans, such as in a game), and train with self-play, then I wouldn’t be surprised if it would report inner experiences.
Experiments along these lines would be worth doing, although assembling a corpus of text containing no examples of people talking about their inner worlds could be difficult.
Sam Altman once mentioned a test: Don’t train an LLM (or other AI system) on any text about consciousness and see if the system will still report having inner experiences unprompted. I would predict a normal LLM would not. At least if we are careful to remove all implied consciousness, which excludes most texts by humans.
I second this prediction, and would go further in saying that just removing explicit discourse about consciousness is sufficient
With a sufficiently strong LLM, I think you could still elicit reports of inner dialogs if you prompt lightly, such as “put yourself into the shoes of...”. That’s because inner monologs are implied in many reasoning processes, even if not explicitly mentioned so.
Sam Altman once mentioned a test: Don’t train an LLM (or other AI system) on any text about consciousness and see if the system will still report having inner experiences unprompted. I would predict a normal LLM would not. At least if we are careful to remove all implied consciousness, which excludes most texts by humans. But if we have a system that can interact with some environment, have some hidden state, observe some of its own hidden state, and can maybe interact with other such systems (or maybe humans, such as in a game), and train with self-play, then I wouldn’t be surprised if it would report inner experiences.
Experiments along these lines would be worth doing, although assembling a corpus of text containing no examples of people talking about their inner worlds could be difficult.
I second this prediction, and would go further in saying that just removing explicit discourse about consciousness is sufficient
With a sufficiently strong LLM, I think you could still elicit reports of inner dialogs if you prompt lightly, such as “put yourself into the shoes of...”. That’s because inner monologs are implied in many reasoning processes, even if not explicitly mentioned so.