Behavioural psychology of AI should be an empirical field of study. Methodologically, the progression is reversed:
Accumulate evidence about AI behaviour
Propose theories that compactly describe (some aspects of) AI behaviour, and are simultaneously more specific (and more predictive) than “it just predicts the next most probable token”. By this logic, we can say “it just follows along the unitary evolution of the universe”.
Cross-validate the theories of mechanistic interpretability (“AI neuroscience”) and AI psychology with each other, just as human neuroscience and human psychology are now used to inform and cross-validate each other.
Base the theories of AI consciousness on the evidence from both mechanistic interpretability and AI psychology, just as theories of human/animal consciousness are based on the evidence from both human/animal neuroscience and human/animal psychology.
AI psychology becomes a proper field of study when the behaviour of systems becomes complex and couldn’t be explained by lower-level theories both (1) compactly and (2) with enough accuracy and predictive insight. When the behaviour becomes this complex, using only lower-level theories becomes reductionism.
Whether AI behaviour is already past this point in complexity is not well-established. I strongly feel that yes, it is (I think the behaviour of ChatGPT is already in many ways more complex than the behaviour of most animals, yet zoopsychology is already a proper, non-reductionistic field of study). Regardless, step one and step two in the list above should be undertaken anyway to establish this, and at least step two already requires some skills, training, and disposition of a scientist/scholar of psychology.
Also, consider that even if ChatGPT is not yet quite at this level, the future versions of AI which are going to be released this year (or, max. next year) will definitely be past this bar.
The biggest hurdle is the fact that architectures change so quickly, and the behaviour could plausibly change completely even with mere scaling of the same architectures. Note that this exact hurdle was identified for mechanistic interpretability, too. But this doesn’t mean that trying to interpret the current AIs is not valuable. Similarly, it’s valuable to conduct psychological studies of present AIs already and to monitor how the psychology of AIs change with architecture changes and model scaling.
Does anyone even have a good theory, of when it is correct to attribute psychological states and properties to an AI?
Behavioural psychology of AI should be an empirical field of study. Methodologically, the progression is reversed:
Accumulate evidence about AI behaviour
Propose theories that compactly describe (some aspects of) AI behaviour, and are simultaneously more specific (and more predictive) than “it just predicts the next most probable token”. By this logic, we can say “it just follows along the unitary evolution of the universe”.
Cross-validate the theories of mechanistic interpretability (“AI neuroscience”) and AI psychology with each other, just as human neuroscience and human psychology are now used to inform and cross-validate each other.
Base the theories of AI consciousness on the evidence from both mechanistic interpretability and AI psychology, just as theories of human/animal consciousness are based on the evidence from both human/animal neuroscience and human/animal psychology.
AI psychology becomes a proper field of study when the behaviour of systems becomes complex and couldn’t be explained by lower-level theories both (1) compactly and (2) with enough accuracy and predictive insight. When the behaviour becomes this complex, using only lower-level theories becomes reductionism.
Whether AI behaviour is already past this point in complexity is not well-established. I strongly feel that yes, it is (I think the behaviour of ChatGPT is already in many ways more complex than the behaviour of most animals, yet zoopsychology is already a proper, non-reductionistic field of study). Regardless, step one and step two in the list above should be undertaken anyway to establish this, and at least step two already requires some skills, training, and disposition of a scientist/scholar of psychology.
Also, consider that even if ChatGPT is not yet quite at this level, the future versions of AI which are going to be released this year (or, max. next year) will definitely be past this bar.
The biggest hurdle is the fact that architectures change so quickly, and the behaviour could plausibly change completely even with mere scaling of the same architectures. Note that this exact hurdle was identified for mechanistic interpretability, too. But this doesn’t mean that trying to interpret the current AIs is not valuable. Similarly, it’s valuable to conduct psychological studies of present AIs already and to monitor how the psychology of AIs change with architecture changes and model scaling.