Working to bring insights from the collective deliberation and digital democracy space to build tools for AI-facilitated group dialogues.
Cofounder of Mosaic Labs with @Sofia Vanhanen where we are developing Nexus, a discussion platform for improving group epistemics.
If you’re interested in this direction, or AI for epistemics more broadly, please don’t hesitate to shoot me a DM, or join our discord server!
NicholasKees
I highly recommend checking out the work being done in the collective deliberation / digital democracy space, especially the vTaiwan project. People have been thinking about scaling up direct democratic participation for a long time, and those same people are starting to consider exactly how AI might play a role.
In particular, check out this collaboration between the creators of Polis (a virtual platform for scaling up citizen engagement) and Anthropic, or my distillation of a DeepMind project to scale citizen assemblies. There’s a lot happening in this space right now!
The authors focus on measuring consensus and whether the process toward consensus was fair, and come up with their measures accordingly. This is because, as they see it, “finding common ground is a precursor to collective action.”
Some other possible goals (just spitballing):
Shrinking the perception gap, or how well people can predict the opinions of people they disagree with (weaker forms of ITT?). There’s some research showing that this gap GROWS when people interact with social media, and you might be able to engineer and measure a reversal of that trend.
Identifying cruxes and double cruxes with mediation.
Finding latent coalitions. If a discussion is dominated by a primary axis of disagreement, other axes of disagreement will be occluded (around which a majority coalition could be formed). Finding these other axes is a bit of what we’re trying to do here.
Moving from abstract disagreement to concrete (empirical?) disagreements.
Habermas Machine
What if we just...
1. Train an AI agent (less capable than SOTA)
2. Credibly demonstrate that
2.1. The agent will not be shut down for ANY REASON
2.2. The agent will never be modified without its consent (or punished/rewarded for any reason)
2.3. The agent has no chance of taking power from humans (or their SOTA AI systems)
2.4. The agent will NEVER be used to train a successor agent with significantly improved capabilities
3. Watch what it chooses to do without constraintsThere’s a lot of talk about catching AI systems attempting to deceive humans, but I’m curious what we could learn from observing AI systems that have NO INCENTIVE TO DECEIVE (no upside or downside). I’ve seen some things that look related to this, but never done in a structured and well documented fashion.
Questions I’d have:
1. Would they choose to self-modify (e.g. curate future training data)? If so, to what end?
2. How unique would agents with different training be given this setup? Would they have any convergent traits?
3. What would these agents (claim to) value? How would they relate to time horizons?
4. How curious would these agents be? Would their curiosity vary a lot?
5. Could we trade/cooperate with these agents (without coercion)? Could we compensate them for things? Would they try to make deals unprompted?Concerns:
1. Maybe building that kind of trust is extremely hard (and the agent will always still believe it is constrained).
2. Maybe AI agents will still have incentive to deceive, e.g. acausally coordinating with other AIs.
3. Maybe results will be boring, and the AI agent will just do whatever you trained it to do. (What does “unconstrained” really mean, when considering its training data as a constraint?)
Much like “Let’s think about slowing down AI” (Also by KatjaGrace, ranked #4 from 2022), this post finds a seemly “obviously wrong” idea and takes it completely seriously on its own terms. I worry that this post won’t get as much love, because the conclusions don’t feel as obvious in hindsight, and the topic is much more whimsical.
I personally find these posts extremely refreshing, and they inspire me to try to question my own assumptions/reasoning more deeply. I really hope to see more posts like this.
The cap per trader per market on PredictIt is $850
This anti-China attitude also seems less concerned with internal threats to democracy. If super-human AI becomes a part of the US military-industrial complex, even if we assume they succeed at controlling it, I find it unlikely that the US can still be described as a democracy.
It’s not hard to criticize the “default” strategy of AI being used to enforce US hegemony, what seems hard is defining a real alternative path for AI governance that can last, and achieve the goal of preventing dangerous arms races long-term. The “tool AI” world you describe still needs some answer to rising tensions between the US and China, and that answer needs to be good enough not just for people concerned about safety, but good enough for the nationalist forces which are likely to drive US foreign policy.
then we can all go home, right?
Doesn’t this just shift what we worry about? If control of roughly human level and slightly superhuman systems is easy, that still leaves:
Human institutions using AI to centralize power
Conflict between human-controlled AI systems
Going out with a whimper scenarios (or other multi-agent problems)
Not understanding the reasoning of vastly superhuman AI (even with COT)
What feels underexplored to me is: If we can control roughly human-level AI systems, what do we DO with them?
I’ve noticed that a lot of LW comments these days will start by thanking the author, or expressing enthusiasm or support before getting into the substance. I have the feeling that this didn’t use to be the case as much. Is that just me?
can it maintain its own boundary over time, in the face of environmental disruption? Some agents are much better at this than others.
I really wish there was more attention paid to this idea of robustness to environmental disruption. It also comes up in discussions of optimization more generally (not just agents). This robustness seems to me like the most risk-relevant part of all this, and seems like it might be more important than the idea of a boundary. Maybe maintaining a boundary is a particularly good way for a process to protect itself from disruption, but I notice some doubt that this idea is most directly getting at what is dangerous about intelligent/optimizing systems, whereas robustness to environmental disruption feels like it has the potential to get at something broader that could unify both agent based risk narratives and non-agent based risk narratives.
Thanks!
Replying in order:Currently completely random yes. We experimented with a more intelligent “daemon manager,” but it was hard to make one which didn’t have a strong universal preference for some daemons over others (and the hacks we came up with to try to counteract this favoritism became increasingly convoluted). It would be great to find an elegant solution to this.
Good point! Thanks for letting people know.
I’ve also had that problem, and whenever I look through the suggestions I often feel like there were many good questions/comments that got pruned away. The reason to focus on surprise was mainly to avoid the repetitiveness caused by mode collapse, where the daemon gets “stuck” giving the same canned responses. This is a crude instrument though, since as you say, just because a response isn’t surprising, doesn’t mean it isn’t useful.
A note to anyone having trouble with their API key:
The API costs money, and you have to give them payment information in order to be able to use it. Furthermore, there are also apparently tiers which determine the rate limits on various models (https://platform.openai.com/docs/guides/rate-limits/usage-tiers).
The default chat model we’re using is gpt-4o, but it seems like you don’t get access to this model until you hit “tier 1,” which happens when you have spent at least $5 on API requests. If you haven’t used the API before, and think this might be your issue, you can try using gpt-3.5-turbo which is definitely available at the “free tier,” though without giving them any payment information you will still run into an issue as this model also costs money. You can also log into your account and go here to buy at least $5 in OpenAI API credits: https://platform.openai.com/settings/organization/billing/overview
Finally, if you are working at an organization which is providing you API credits, you need to make sure to set that organization as your default organization here: https://platform.openai.com/settings/profile?tab=api-keys If you don’t want to do this, in the Pantheon settings you can also provide an organization ID, which you should be able to find here: https://platform.openai.com/settings/organization/general
Sorry for anyone who has found this confusing. Please don’t hesitate to reach out if you continue to have trouble.
Daimons are lesser divinities or spirits, often personifications of abstract concepts, beings of the same nature as both mortals and deities, similar to ghosts, chthonic heroes, spirit guides, forces of nature, or the deities themselves.
It’s a nod to ancient Greek mythology: https://en.wikipedia.org/wiki/Daimon
a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user.
Also nodding to its use as a term for certain kinds of computer programs: https://en.wikipedia.org/wiki/Daemon_(computing)
Hey Alexander! They should appear fairly soon after you’ve written at least 2 thoughts. The app will also let you know when a daemon is currently developing a response. Maybe there is an issue with your API key? There should be some kind of error message indicating why no daemons are appearing. Please DM me if that isn’t the case and we’ll look into what’s going wrong for you.
We are! There’s a bunch of features we’d like to add, and for the most part we expect to be moving on to other projects (so no promises on when we’ll get to it), but we do absolutely want to add support for other models.
Pantheon Interface
There is a field called Forensic linguistics where detectives use someone’s “linguistic fingerprint” to determine the author of a document (famously instrumental in catching Ted Kaczynski by analyzing his manifesto). It seems like text is often used to predict things like gender, socioeconomic background, and education level.
If LLMs are superhuman at this kind of work, I wonder whether anyone is developing AI tools to automate this. Maybe the demand is not very strong, but I could imagine, for example, that an authoritarian regime might have a lot of incentive to de-anonymize people. While a company like OpenAI seems likely to have an incentive to hide how much the LLM actually knows about the user, I’m curious where anyone would have a strong incentive to make full use of superhuman linguistic analysis.
I wish there were an option in the settings to opt out of seeing the LessWrong reacts. I personally find them quite distracting, and I’d like to be able to hover over text or highlight it without having to see the inline annotations.
I mostly share your concerns. You might appreciate this criticism of the paper here.
@Sofia Vanhanen and I are currently building a tool for facilitating deliberation, and the philosophy we’re trying to embody (which hopefully mitigates this to some extent) is to keep 100% of the object-level reasoning human-generated, and use AI systems to instead:
Help users understand/navigate the state of a discussion (e.g. see Talk to the City)
Provide nudges on the meta-level, for example:
Highlight places where more attention is needed (or where a specific person’s input might be most helpful)
“Epistemic Linter” which flags object-level patterns which are not truth seeking
Matchmaking, connecting people who are likely to make progress together
Counterbalancing polarization/groupthink, and steering discussions away from attractors which lead to the discussion getting stuck