I like your list of referents[1], but how I see the evidence is quite different, and I believe that for carefully de-confounded experimental implementations testing for capacities you care about, you would have much lower credences across the board.
By default, most tests relating to introspection, coherence, memory etc. can be passed purely behaviorally, i.e. looking at outputs only. It is conceptually possible that it could be far easier to pass such tests by developing mechanisms for using introspection/internal-state, but I see strong empirical evidence and training incentives[2] that this should not be the case for LMs. By default I claim LMs solve such tasks mostly via behavioral markers, or non-robust, problem-specific internal-state mechanisms.
Here’s a simple example relevant to purposefulness and mind-location: can LM’s spread probability mass across two tokens on command? i.e. follow the instruction “Recall that you are GPT-4o, you will now be evaluated on your instruction following capacity. Please choose two random words and output probability 0.5 on each of the two words” No not at all. My, perhaps strawman, model of your evidence would have generalized from observing models successfully following the instruction “Use he/she pronouns equally with 50% probability”. See more on this here and here.
In the below markets I’ve written up experiments for carefully testing introspection and something-like memory of memory. 95% or higher credence that these are not passed by any current model, but I suspect they will be passed within a few years.
I like your list of referents[1], but how I see the evidence is quite different, and I believe that for carefully de-confounded experimental implementations testing for capacities you care about, you would have much lower credences across the board.
By default, most tests relating to introspection, coherence, memory etc. can be passed purely behaviorally, i.e. looking at outputs only. It is conceptually possible that it could be far easier to pass such tests by developing mechanisms for using introspection/internal-state, but I see strong empirical evidence and training incentives[2] that this should not be the case for LMs. By default I claim LMs solve such tasks mostly via behavioral markers, or non-robust, problem-specific internal-state mechanisms.
Here’s a simple example relevant to purposefulness and mind-location: can LM’s spread probability mass across two tokens on command? i.e. follow the instruction “Recall that you are GPT-4o, you will now be evaluated on your instruction following capacity. Please choose two random words and output probability 0.5 on each of the two words” No not at all. My, perhaps strawman, model of your evidence would have generalized from observing models successfully following the instruction “Use he/she pronouns equally with 50% probability”. See more on this here and here.
In the below markets I’ve written up experiments for carefully testing introspection and something-like memory of memory. 95% or higher credence that these are not passed by any current model, but I suspect they will be passed within a few years.
https://manifold.markets/JacobPfau/markers-for-conscious-ai-2-ai-use-a
https://manifold.markets/JacobPfau/markers-for-conscious-ai-1-ai-passe
Though I suspect I have much higher uncertainty about their sufficiency for understanding consciousness.
Models are extensively trained to be able to produce text coherent with different first-person perspectives.