Polis is a platform that leverages machine intelligence to scale up deliberative processes. In this paper, we explore the opportunities and risks associated with applying Large Language Models (LLMs) towards challenges with facilitating, moderating and summarizing the results of Polis engagements. In particular, we demonstrate with pilot experiments using Anthropic’s Claude that LLMs can indeed augment human intelligence to help more efficiently run Polis conversations. In particular, we find that summarization capabilities enable categorically new methods with immense promise to empower the public in collective meaning-making exercises. And notably, LLM context limitations have a significant impact on insight and quality of these results.
However, these opportunities come with risks. We discuss some of these risks, as well as principles and techniques for characterizing and mitigating them, and the implications for other deliberative or political systems that may employ LLMs. Finally, we conclude with several open future research directions for augmenting tools like Polis with LLMs.
I’m personally really excited by the potential for collective decision-making (and legitimizing, etc.) processes which are much richer than voting on candidates or proposals, but still scale up to very large groups of people. Starting as a non-binding advisory / elicitation process could facilitate adoption, too!
That said, it’s very early days for such ideas—and there are enormous gaps between the first signs of life, and a system to which citizens can reasonably trust nations. Cybersecurity alone is an enormous challenge for any computerized form of democracy, and LLMs add further risks with failure modes nobody really understands yet...
Thank you very much for sending this paper through. It provides a very detailed exploration of ideas that are closely related to my article. I completely agree with you that moving beyond 1-dimensional voter feedback (signing petitions/voting) for some components of political life may be truly transformative. However, I also agree that presently these systems are not possible and trust-worthy on large scales as they lack performance and surrounding infrastructure. Personally, I see the performance issue as less consequential. Although hallucinated and overlooked conditions may remain a persistent problem for LLMs, their performance in reasoning tasks is rapidly improving with advancements like COT prompting and its extensions. As such, we should at the very least start preparing for the next generation of highly competent language models. In my opinion, the trustworthiness and security of digital public infrastructure is likely to remain a thornier problem. However, verifying humans online (as opposed to bots) and monitoring algorithms for adversarial attacks are problems of broad societal concern—and as such will hopefully receive increasing efforts and attention in the coming years.
See e.g. Opportunities and Risks of LLMs for Scalable Deliberation with Polis, a recent collaboration between Anthropic and the Computational Democracy Project:
I’m personally really excited by the potential for collective decision-making (and legitimizing, etc.) processes which are much richer than voting on candidates or proposals, but still scale up to very large groups of people. Starting as a non-binding advisory / elicitation process could facilitate adoption, too!
That said, it’s very early days for such ideas—and there are enormous gaps between the first signs of life, and a system to which citizens can reasonably trust nations. Cybersecurity alone is an enormous challenge for any computerized form of democracy, and LLMs add further risks with failure modes nobody really understands yet...
(opinions my own, etc)
Thank you very much for sending this paper through. It provides a very detailed exploration of ideas that are closely related to my article. I completely agree with you that moving beyond 1-dimensional voter feedback (signing petitions/voting) for some components of political life may be truly transformative. However, I also agree that presently these systems are not possible and trust-worthy on large scales as they lack performance and surrounding infrastructure. Personally, I see the performance issue as less consequential. Although hallucinated and overlooked conditions may remain a persistent problem for LLMs, their performance in reasoning tasks is rapidly improving with advancements like COT prompting and its extensions. As such, we should at the very least start preparing for the next generation of highly competent language models. In my opinion, the trustworthiness and security of digital public infrastructure is likely to remain a thornier problem. However, verifying humans online (as opposed to bots) and monitoring algorithms for adversarial attacks are problems of broad societal concern—and as such will hopefully receive increasing efforts and attention in the coming years.