However, a disadvantage of having many truthfulness-evaluation bodies is that
it increases the risk that one or more of these bodies is effectively captured by
some group. Consequently, an alternative would be to use decentralised evaluation
bodies, perhaps modelled on existing decentralised systems like Wikipedia,
open-source software projects, or prediction markets. Decentralised systems
might be harder to capture because they rely on many individuals who can be
both geographically dispersed and hard to identify. Overall, both the existence
of multiple evaluation bodies and of decentralised bodies might help to protect
against capture and allow for a nimble response to new evidence.
Thanks for addressing some very important questions, but this part feels too optimistic (or insufficiently pessimistic) to me. If I was writing this paper, I’d add some notes about widespread complaints of left-wing political bias in Wikipedia and academia (you don’t mention the latter but surely it counts as a decentralized truth-evaluation body?), and note that open-source software projects and prediction markets are both limited to topics with clear and relatively short feedback cycles from reality / ground truth (e.g., we don’t have to wait decades to find out for sure whether some code works or not, prediction markets can’t handle questions like “What causes outcome disparities between groups A and B?”). I would note that on questions outside this limited set, we seem to know very little about how to prevent any evaluation bodies, whether decentralized or not, from being politically captured.
Thanks, I think that these are good points and worth mentioning. I particularly like the boundary you’re trying to identify between where these decentralized mechanisms have a good track record and where they don’t. On that note I think that although academia does have complaints about political bias, at least some disciplines seem to be doing a fairly good job of truth-tracking on complex topics. I’ll probably think more about this angle.
(I still literally agree with the quoted content, and think that decentralized systems have something going for them which is worth further exploration, but the implicature may be too strong—in particular the two instances of “might” are doing a lot of work.)
1. Political capture is a matter of degree. For a given evaluation mechanism, we can ask what percentage of answers given by the mechanism were false or inaccurate due to bias. My sense is that some mechanisms/resources would score much better than others. I’d be excited for people to do this kind of analysis with the goal of informing the design of evaluation mechanisms for AI.
I expect humans would ask AI many questions that don’t depend much on controversial political questions. This would include most questions about the natural sciences, math/CS, and engineering. This would also include “local” questions about particular things (e.g. “Does the doctor I’m seeing have expertise in this particular sub-field?”, “Am I likely to regret renting this particular apartment in a year?”). Unless the evaluation mechanism is extremely biased, it seems unlikely it would give biased answers for these questions. (The analogous question is what percentage of all sentences on Wikipedia are politically controversial.)
2. AI systems have the potential to provide rich epistemic information about their answers. If a human is especially interested in a particular question, they could ask, “Is this controversial? What kind of biases might influence answers (including your own answers)? What’s the best argument on the opposing side? How would you bet on a concrete operationalized version of the question?”. The general point is that humans can interact with the AI to get more nuanced information (compared to Wikipedia or academia). On the other hand: (a) some humans won’t ask for more nuance, (b) AIs may not be smart enough to provide it, (c) the same political bias may influence how the AI provides nuance.
3. Over time, I expect AI will be increasingly involved in the process of evaluating other AI systems. This doesn’t remove human biases. However, it might mean the problem of avoiding capture is somewhat different than with (say) academia and other human institutions.
Unless the evaluation mechanism is extremely biased, it seems unlikely it would give biased answers for these questions.
But there’s now a question of “what is the AI trying to do?” If the truth-evaluation method is politically biased (even if not “extremely”), then it’s very likely no longer “trying to tell the truth”. I can imagine two other possibilities:
It might be “trying to advance a certain political agenda”. In this case I can imagine that it will selectively and unpredictably manipulate answers to especially important questions. For example it might insert backdoors into infrastructure-like software when users ask it coding questions, then tell other users how to take advantage of those backdoors to take power, or damage some important person or group’s reputation by subtly manipulating many answers that might influence how others view that person/group, or push people’s moral views in a certain direction by subtly manipulating many answers, etc.
It might be “trying to tell the truth using a very strange prior or reasoning process”, which also seems likely to have unpredictable and dangerous consequences down the line, but harder for me to imagine specific examples as I have little idea what the prior or reasoning process will be.
Do you have another answer to “what is the AI trying to do?”, or see other reasons to be less concerned about this than I am?
I think this touches on the issue of the definition of “truth”. A society designates something to be “true” when the majority of people in that society believe something to be true.
Using the techniques outlined in this paper, we could regulate AIs so that they only tell us things we define as “true”. At the same time, a 16th century society using these same techniques would end up with an AI that tells them to use leeches to cure their fevers.
What is actually being regulated isn’t “truthfulness”, but “accepted by the majority-ness”.
This works well for things we’re very confident about (mathematical truths, basic observations), but begins to fall apart once we reach even slightly controversial topics. This is exasperated by the fact that even seemingly simple issues are often actually quite controversial (astrology, flat earth, etc.).
This is where the “multiple regulatory bodies” part comes in. If we have a regulatory body that says “X, Y, and Z are true” and the AI passes their test, you know the AI will give you answers in line with that regulatory body’s beliefs.
There could be regulatory bodies covering the whole spectrum of human beliefs, giving you a precise measure of where any particular AI falls within that spectrum.
Would this multiple evaluation/regulatory bodies solution not just lead to the sort of balkanized internet described in this story? I guess multiple internet censorship-and-propaganda-regimes is better than one. But ideally we’d have none.
One alternative might be to ban or regulate persuasion tools, i.e. any AI system optimized for an objective/reward function that involves persuading people of things. Especially politicized or controversial things.
Standards for truthful AI could be “opt-in”. So humans might (a) choose to opt into truthfulness standards for their AI systems, and (b) choose from multiple competing evaluation bodies. Standards need not be mandated by governments to apply to all systems. (I’m not sure how much of your Balkanized internet is mandated by governments rather than arising from individuals opting into different web stacks).
We also discuss having different standards for different applications. For example, you might want stricter and more conservative standards for AI that helps assess nuclear weapon safety than for AI that teaches foreign languages to children or assists philosophers with thought experiments.
In my story it’s partly the result of individual choice and partly the result of government action, but I think even if governments stay out of it, individual choice will be enough to get us there. There won’t be a complete stack for every niche combination of views; instead, the major ideologies will each have their own stack. People who don’t agree 100% with any major ideology (which is most people) will have to put up with some amount of propaganda/censorship they don’t agree with.
Thanks for addressing some very important questions, but this part feels too optimistic (or insufficiently pessimistic) to me. If I was writing this paper, I’d add some notes about widespread complaints of left-wing political bias in Wikipedia and academia (you don’t mention the latter but surely it counts as a decentralized truth-evaluation body?), and note that open-source software projects and prediction markets are both limited to topics with clear and relatively short feedback cycles from reality / ground truth (e.g., we don’t have to wait decades to find out for sure whether some code works or not, prediction markets can’t handle questions like “What causes outcome disparities between groups A and B?”). I would note that on questions outside this limited set, we seem to know very little about how to prevent any evaluation bodies, whether decentralized or not, from being politically captured.
Thanks, I think that these are good points and worth mentioning. I particularly like the boundary you’re trying to identify between where these decentralized mechanisms have a good track record and where they don’t. On that note I think that although academia does have complaints about political bias, at least some disciplines seem to be doing a fairly good job of truth-tracking on complex topics. I’ll probably think more about this angle.
(I still literally agree with the quoted content, and think that decentralized systems have something going for them which is worth further exploration, but the implicature may be too strong—in particular the two instances of “might” are doing a lot of work.)
A few points:
1. Political capture is a matter of degree. For a given evaluation mechanism, we can ask what percentage of answers given by the mechanism were false or inaccurate due to bias. My sense is that some mechanisms/resources would score much better than others. I’d be excited for people to do this kind of analysis with the goal of informing the design of evaluation mechanisms for AI.
I expect humans would ask AI many questions that don’t depend much on controversial political questions. This would include most questions about the natural sciences, math/CS, and engineering. This would also include “local” questions about particular things (e.g. “Does the doctor I’m seeing have expertise in this particular sub-field?”, “Am I likely to regret renting this particular apartment in a year?”). Unless the evaluation mechanism is extremely biased, it seems unlikely it would give biased answers for these questions. (The analogous question is what percentage of all sentences on Wikipedia are politically controversial.)
2. AI systems have the potential to provide rich epistemic information about their answers. If a human is especially interested in a particular question, they could ask, “Is this controversial? What kind of biases might influence answers (including your own answers)? What’s the best argument on the opposing side? How would you bet on a concrete operationalized version of the question?”. The general point is that humans can interact with the AI to get more nuanced information (compared to Wikipedia or academia). On the other hand: (a) some humans won’t ask for more nuance, (b) AIs may not be smart enough to provide it, (c) the same political bias may influence how the AI provides nuance.
3. Over time, I expect AI will be increasingly involved in the process of evaluating other AI systems. This doesn’t remove human biases. However, it might mean the problem of avoiding capture is somewhat different than with (say) academia and other human institutions.
But there’s now a question of “what is the AI trying to do?” If the truth-evaluation method is politically biased (even if not “extremely”), then it’s very likely no longer “trying to tell the truth”. I can imagine two other possibilities:
It might be “trying to advance a certain political agenda”. In this case I can imagine that it will selectively and unpredictably manipulate answers to especially important questions. For example it might insert backdoors into infrastructure-like software when users ask it coding questions, then tell other users how to take advantage of those backdoors to take power, or damage some important person or group’s reputation by subtly manipulating many answers that might influence how others view that person/group, or push people’s moral views in a certain direction by subtly manipulating many answers, etc.
It might be “trying to tell the truth using a very strange prior or reasoning process”, which also seems likely to have unpredictable and dangerous consequences down the line, but harder for me to imagine specific examples as I have little idea what the prior or reasoning process will be.
Do you have another answer to “what is the AI trying to do?”, or see other reasons to be less concerned about this than I am?
I think this touches on the issue of the definition of “truth”. A society designates something to be “true” when the majority of people in that society believe something to be true.
Using the techniques outlined in this paper, we could regulate AIs so that they only tell us things we define as “true”. At the same time, a 16th century society using these same techniques would end up with an AI that tells them to use leeches to cure their fevers.
What is actually being regulated isn’t “truthfulness”, but “accepted by the majority-ness”.
This works well for things we’re very confident about (mathematical truths, basic observations), but begins to fall apart once we reach even slightly controversial topics. This is exasperated by the fact that even seemingly simple issues are often actually quite controversial (astrology, flat earth, etc.).
This is where the “multiple regulatory bodies” part comes in. If we have a regulatory body that says “X, Y, and Z are true” and the AI passes their test, you know the AI will give you answers in line with that regulatory body’s beliefs.
There could be regulatory bodies covering the whole spectrum of human beliefs, giving you a precise measure of where any particular AI falls within that spectrum.
Would this multiple evaluation/regulatory bodies solution not just lead to the sort of balkanized internet described in this story? I guess multiple internet censorship-and-propaganda-regimes is better than one. But ideally we’d have none.
One alternative might be to ban or regulate persuasion tools, i.e. any AI system optimized for an objective/reward function that involves persuading people of things. Especially politicized or controversial things.
Standards for truthful AI could be “opt-in”. So humans might (a) choose to opt into truthfulness standards for their AI systems, and (b) choose from multiple competing evaluation bodies. Standards need not be mandated by governments to apply to all systems. (I’m not sure how much of your Balkanized internet is mandated by governments rather than arising from individuals opting into different web stacks).
We also discuss having different standards for different applications. For example, you might want stricter and more conservative standards for AI that helps assess nuclear weapon safety than for AI that teaches foreign languages to children or assists philosophers with thought experiments.
In my story it’s partly the result of individual choice and partly the result of government action, but I think even if governments stay out of it, individual choice will be enough to get us there. There won’t be a complete stack for every niche combination of views; instead, the major ideologies will each have their own stack. People who don’t agree 100% with any major ideology (which is most people) will have to put up with some amount of propaganda/censorship they don’t agree with.