Nathan Helm-Burger comments on Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude

Nathan Helm-Burger 19 Jan 2025 19:31 UTC
1 point
0
Important clarification: when I say that I am doubtful that these observations imply ‘consciousness’, I am referring to ‘phenomenal consciousness’. In other words, an awareness of self and world accompanied by qualia (aka phenomenal experience). What we are seeing does look to me like clear indication of ‘access consciousness’, but that’s insufficient for moral value of the agent. Without the phenomenal experience, the agent can’t have valence over its phenomenal experience. In other words, it doesn’t have a true internal experience about which to care.

Amanda Askell on AI consciousness.

But suppose it’s possible for an agent has what Block (§4-5, 1995) calls “access consciousness”—it can extract and integrate information from its environment, and access this information when making decisions—and yet lack phenomenal consciousness. Such an agent could have the right architecture for consciousness and behave as if it were conscious, and yet have no phenomenal experiences. On my view, we would be rational to think this agent was phenomenally consciousness based on the strength of evidence available. But if we could somehow see the world from their point of view, we would realize we were mistaken.
What links here?
- Nathan Helm-Burger's comment on Six Thoughts on AI Safety by Boaz Barak (5 Mar 2025 8:46 UTC; 2 points)
- Nathan Helm-Burger 19 Jan 2025 19:52 UTC
  2 points
  0
  Parent
  Given this, why then do I merely doubt the presence of phenomenal consciousness instead of being certain of its absence?
  
  Because a reseacher friend pointed out to me that one could interpret RL training as supplying a sort of ‘valence’.
  
  I think the most direct way to clear up this confusion is through experiments.
  
  This paper shows how you can explicitly give a transformer model access to its internal state: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=LKv32bgAAAAJ&sortby=pubdate&citation_for_view=LKv32bgAAAAJ:dQ2og3OwTAUC
  
  This paper shows how you can make the model’s loss function dependent on both predicting the future and predicting itself: https://arxiv.org/abs/2407.10188
  
  So I think these give us a place to start. Do we notice qualitative shifts in behavior following a fine-tuning regime which combines these techniques?
  
  I have ideas for further experiments at well.