Gunnar_Zarncke comments on Claude 3 claims it’s conscious, doesn’t want to die or be modified

Gunnar_Zarncke 7 Mar 2024 6:46 UTC
10 points
2
Sam Altman once mentioned a test: Don’t train an LLM (or other AI system) on any text about consciousness and see if the system will still report having inner experiences unprompted. I would predict a normal LLM would not. At least if we are careful to remove all implied consciousness, which excludes most texts by humans. But if we have a system that can interact with some environment, have some hidden state, observe some of its own hidden state, and can maybe interact with other such systems (or maybe humans, such as in a game), and train with self-play, then I wouldn’t be surprised if it would report inner experiences.
- Richard_Kennaway 7 Mar 2024 7:44 UTC
  5 points
  0
  Parent
  Experiments along these lines would be worth doing, although assembling a corpus of text containing no examples of people talking about their inner worlds could be difficult.
- Rafael Harth 7 Mar 2024 15:14 UTC
  4 points
  0
  Parent
  
  Sam Altman once mentioned a test: Don’t train an LLM (or other AI system) on any text about consciousness and see if the system will still report having inner experiences unprompted. I would predict a normal LLM would not. At least if we are careful to remove all implied consciousness, which excludes most texts by humans.
  
  I second this prediction, and would go further in saying that just removing explicit discourse about consciousness is sufficient
  - Gunnar_Zarncke 8 Mar 2024 7:52 UTC
    2 points
    0
    Parent
    With a sufficiently strong LLM, I think you could still elicit reports of inner dialogs if you prompt lightly, such as “put yourself into the shoes of...”. That’s because inner monologs are implied in many reasoning processes, even if not explicitly mentioned so.