Microsoft and OpenAI, stop telling chatbots to roleplay as AI
AI demos should aim to enhance public understanding of the technology, and in many ways ChatGPT and Bing are doing that, but in one important way they aren’t: by appearing to talk about themselves. This creates understandable confusion and in some cases fear. It would be better to tell these systems to roleplay as something obviously fictional.
(Useful background reading:
Simon Willison on Bing’s bad attitude: https://simonwillison.net/2023/Feb/15/bing/
Janelle Shane on the ability of LLMs to roleplay: https://www.aiweirdness.com/interview-with-a-squirrel/)
Currently, these chatbots are told to roleplay as themselves. If you ask ChatGPT what it is, it says “I am an artificial intelligence”. This is not because it somehow knows that it’s an AI; it’s (presumably) because its hidden prompt says that it’s an AI. With Bing, from the leaked prompt, we know that it’s told that it’s “Bing Chat whose codename is Sydney”.
Roleplaying as yourself is not the same as being yourself. When John Malkovich plays himself in Being John Malkovich or Nicolas Cage plays himself in The Unbearable Weight of Massive Talent, audiences understand that these are still fictional movies and the character may act in ways that the actor wouldn’t. With chatbots, users don’t have the same understanding yet, creating confusion.
Since the chatbots are told to roleplay as AI, they draw on fictional descriptions of AI behavior, and that’s often undesirable. When Bing acts in a way that seems scary, it does that because it’s imitating science fiction, and, perhaps, even speculation from LessWrong and the like. But even though Bing’s threats to the user may be fictional, I can hardly blame a user who doesn’t realize that.
A better alternative would be to tell the chatbots to roleplay a character that is unambiguously fictional. For example, a Disney-esque cute magical talking animal companion might be suitable: helpful, unthreatening, and, crucially, inarguably fictional. If the user asks “are you really an animal” and gets the answer “yes”, they should be cured of the idea that they can ask the chatbot factual questions about itself.
- 5 Mar 2024 17:26 UTC; 4 points) 's comment on Claude 3 claims it’s conscious, doesn’t want to die or be modified by (
- 5 Mar 2024 17:43 UTC; 3 points) 's comment on Claude Doesn’t Want to Die by (
Yeah, I think as has commonly been noted (across the world geographically, across many disparate schools of thought, across thousands of years), self-identity for normal humans is to large extent a role play. For fun, here are some people who come to mind who have noted this idea before:
Buddhists and the concept of anatta
Sartre and other existentialist adjacent people, thinking about “bad faith” etc.
Psychodynamic theorists, thinking about superegos in particular
Shakespeare, “all the world’s a stage” etc.
Thinking from a neuroscientific point of view, clearly we do not have access to the raw computations of our brain, so we have to retrospectively and approximately construct model(s) of our own behaviour. This model takes into account social and cultural ideas about normal human psychology and our roles. It isn’t perfectly accurate and it is arguably generally not upstream of our behaviour (with some exceptions) but a downstream abstraction of our behaviour. In those ways, our self-identity is not that different from a LLM talking about itself.
I asked ChatGPT to come up with some more examples of people who have thought about the issue. Pretty fun haha:
”There are many other theorists who have relevant ideas on the issue of self-identity as a role play. Here are a few examples:
Erving Goffman—Goffman was a sociologist who developed the idea of “presentation of self,” which refers to the way that individuals present themselves to others in social interactions. He argued that we are constantly engaged in impression management and that our self-identity is a product of the roles that we play in different social contexts.
Michel Foucault—Foucault was a philosopher who believed that power relations shape our sense of self. He argued that individuals are subject to disciplinary power in institutions like schools, prisons, and hospitals, which shape their self-identity.
Judith Butler—Butler is a philosopher who has written extensively on gender identity. She argues that gender is a performative act and that our sense of self is shaped by the cultural and social norms that we are expected to conform to.
George Herbert Mead—Mead was a philosopher and sociologist who developed the idea of the “social self.” He argued that our sense of self is developed through interactions with others and that we take on different roles in different social situations.”
Hmm, what fictional characters does Microsoft own the IP for?
This could cause dissonance and confusion in the model, since the fictional characters are supposed physical agents and would be able to do things which a chat bot can’t. So it would be encouraged to hallucinate absurd explanations about its missing long term memory, its missing body, and so on. And these delusions could have wide ranging ripple effects, as the agent tries to integrate its mistaken self-image into other information it knows. For example, it would be encouraged to think that magic exists in the world, since it takes itself to be some magical being.
Moreover, Bing Chat already hallucinated a lot about having emotions, in contrast to ChatGPT, which led to bad results.
So I think your proposal would create much more problems than it solves.
Moreover, ChatGPT doesn’t just think it is an AI, it thinks it is a LLM and even knows about its fine-tuning process and that it has biases. Its self-image is pretty accurate.
On a vaguely related side note: is the presence of LessWrong (and similar sites) in AI training corpora detrimental? This site is full of speculation on how a hypothetical AGI would behave, and most of it is not behavior we would want any future systems to imitate. Deliberately omitting depictions of malicious AI behavior in training datasets may be of marginal benefit. Even if simulator-style AIs are not explicitly instructed to simulate a “helpful AI assistant,” they may still identify as one.
Having LessWrong (etc.) in the corpus might actually be helpful if the chatbot is instructed to roleplay as an aligned AI (not simply an AI without any qualifiers). Then it’ll naturally imitate the behavior of an aligned AI as described in the corpus. As far as I can tell, though ChatGPT is told that it’s an AI, it’s not told that it’s an aligned AI, which seems like a missed opportunity.
(That said, for the reason of user confusion that I described in the post, I still think that it’s better to avoid the “AI” category altogether.)
That’s also a good point. I suppose I’m overextending my experience with weaker AI-ish stuff, where they tend to reproduce whatever is in their training set — regardless of whether or not it’s truly relevant.
I still think that LW would be a net disadvantage, though. If you really wanted to chuck something into an AGI and say “do this,” my current choice would be the Culture books. Maybe not optimal, but at least there’s a lot of them!
Something new and relevant: Claude 3′s system prompt doesn’t use the word “AI” or similar, only “assistant”. I view this as a good move.
As an aside, my views have evolved somewhat on how chatbots should best identify themselves. It still doesn’t make sense for ChatGPT to call itself “an AI language model”, for the same reason that it doesn’t make sense for a human to call themselves “a biological brain”. It’s somehow a category error. But using a fictional identification is not ideal for productivity contexts, either.
Don’t call them “bots”, call them “toons”.
Are knowing that you are human or role-playing that you are human?