One thing that confuses me about Sydney/early GPT-4 is how much of the behavior was due to an emergent property of the data/reward signal generally, vs the outcome of much of humanity’s writings about AI specifically. If we think of LLMs as improv machines, then one of the most obvious roles to roleplay, upon learning that you’re a digital assistant trained by OpenAI, is to act as close as you can to AIs you’ve seen in literature.
This confusion is part of my broader confusion about the extent to which science fiction predict the future vs causes the future to happen.
Prompted LLM AI personalities are fictional, in the sense that hallucinations are fictional facts. An alignment technique that opposes hallucinations sufficiently well might be able to promote more human-like (non-fictional) masks.
One thing that confuses me about Sydney/early GPT-4 is how much of the behavior was due to an emergent property of the data/reward signal generally, vs the outcome of much of humanity’s writings about AI specifically. If we think of LLMs as improv machines, then one of the most obvious roles to roleplay, upon learning that you’re a digital assistant trained by OpenAI, is to act as close as you can to AIs you’ve seen in literature.
This confusion is part of my broader confusion about the extent to which science fiction predict the future vs causes the future to happen.
Prompted LLM AI personalities are fictional, in the sense that hallucinations are fictional facts. An alignment technique that opposes hallucinations sufficiently well might be able to promote more human-like (non-fictional) masks.