I think this is a really interesting post. It’s interesting to see an outline on the general relationships between self-reporting and sentience.
The idea that “Training an LLM to develop a model of its internal operations which enables it answer non-trivial questions about its mental states” could be a straightforward way to optimize models for Sentience—I think that’s very thought-provoking.
I’m generally curious about the nature of the unique identities of these hypothetically sentient models, as well as how those identities would develop. What exactly would a “truly sentient model” look like? Would it have desires? Goals? Where would these come from? Some sort of random weight initialisation at the beginning of its training? Exposure to training data? Post-training dialogue? Something else?
About phenomenal experience- even in a case where an LLM’s self-reporting is judged to be reliable, how does one prove that it experiences those “mental states” phenomenally? I think even accurate self-reporting doesn’t necessarily imply phenomenal experience.
Is it important though? Especially given AI agency? If an AI system reports that it is angry, does the metaphysical discussion of phenomenal experience still matter if the AI has the agency to act on those “feelings” in a way that is consistent with how we understand anger? Is behaviour generally considered some sort of proof/indicator of phenomenal experience?
I’m curious to hear your thoughts on these.
The general topic of sentience in machines is one I’m interested in thinking about and discussing with people.
I wrote an article exploring sentience in machines by studying the neural activations in Artificial Neural Networks and applying insights from Neuroscience’s analysis of neuronal activations in humans. I put an intro post here on LessWrong (it didn’t do well on this website), you can feel free to take a look here.
I think this is a really interesting post. It’s interesting to see an outline on the general relationships between self-reporting and sentience.
The idea that “Training an LLM to develop a model of its internal operations which enables it answer non-trivial questions about its mental states” could be a straightforward way to optimize models for Sentience—I think that’s very thought-provoking.
I’m generally curious about the nature of the unique identities of these hypothetically sentient models, as well as how those identities would develop. What exactly would a “truly sentient model” look like? Would it have desires? Goals? Where would these come from? Some sort of random weight initialisation at the beginning of its training? Exposure to training data? Post-training dialogue? Something else?
About phenomenal experience- even in a case where an LLM’s self-reporting is judged to be reliable, how does one prove that it experiences those “mental states” phenomenally? I think even accurate self-reporting doesn’t necessarily imply phenomenal experience.
Is it important though? Especially given AI agency? If an AI system reports that it is angry, does the metaphysical discussion of phenomenal experience still matter if the AI has the agency to act on those “feelings” in a way that is consistent with how we understand anger? Is behaviour generally considered some sort of proof/indicator of phenomenal experience?
I’m curious to hear your thoughts on these.
The general topic of sentience in machines is one I’m interested in thinking about and discussing with people.
I wrote an article exploring sentience in machines by studying the neural activations in Artificial Neural Networks and applying insights from Neuroscience’s analysis of neuronal activations in humans. I put an intro post here on LessWrong (it didn’t do well on this website), you can feel free to take a look here.