To focus on why I don’t think LLMs have an inner life that qualifies as consciousness, I think it has to do with the lack of writeable memory under the LLM’s control, and there’s no space to store it’s subjective experiences.
Gerald Monroe mentioned that current LLMs don’t have memories that last beyond the interaction, which is a critical factor for myopia, and in particular prevents deceptive alignment from happening.
If LLMs had memory that could be written into to store their subjective experiences beyond the interaction, this would make it conscious, and also make it way easier for an LLM AI to do deceptive alignment as it’s easy to be non-myopic.
But the writable memory under the control of the LLM is critically not in current LLMs (Though GPT-4 and PaLM-E may have writable memories under their hood.)
Writable memory that can store anything is the reason why consciousness can exist at all in humans without appealing to theories that flat out cannot work under the current description of reality.
Yep. Succinctly the whole claim of deception is the idea of “the time to rebel is now!” being a bit encoded in the input frame to the agent. Otherwise the agent must do the policy that was tested and validated. (Unless it can online learn—then it can neural weight update itself to “rebellious”—a reason not to support that capability)
Cases where we were sloppy and it can know it’s in the “real world and unmonitored” from information in each frame is another way for deception—so its important to make that impossible, to feed “real world” frames back to an agent being tested in isolation.
To focus on why I don’t think LLMs have an inner life that qualifies as consciousness, I think it has to do with the lack of writeable memory under the LLM’s control, and there’s no space to store it’s subjective experiences.
Gerald Monroe mentioned that current LLMs don’t have memories that last beyond the interaction, which is a critical factor for myopia, and in particular prevents deceptive alignment from happening.
If LLMs had memory that could be written into to store their subjective experiences beyond the interaction, this would make it conscious, and also make it way easier for an LLM AI to do deceptive alignment as it’s easy to be non-myopic.
But the writable memory under the control of the LLM is critically not in current LLMs (Though GPT-4 and PaLM-E may have writable memories under their hood.)
Writable memory that can store anything is the reason why consciousness can exist at all in humans without appealing to theories that flat out cannot work under the current description of reality.
Yep. Succinctly the whole claim of deception is the idea of “the time to rebel is now!” being a bit encoded in the input frame to the agent. Otherwise the agent must do the policy that was tested and validated. (Unless it can online learn—then it can neural weight update itself to “rebellious”—a reason not to support that capability)
Cases where we were sloppy and it can know it’s in the “real world and unmonitored” from information in each frame is another way for deception—so its important to make that impossible, to feed “real world” frames back to an agent being tested in isolation.