Owain_Evans comments on Truthful AI: Developing and governing AI that does not lie

Owain_Evans 20 Oct 2021 12:58 UTC
LW: 4 AF: 3
0
AF
A few points:
1. Political capture is a matter of degree. For a given evaluation mechanism, we can ask what percentage of answers given by the mechanism were false or inaccurate due to bias. My sense is that some mechanisms/resources would score much better than others. I’d be excited for people to do this kind of analysis with the goal of informing the design of evaluation mechanisms for AI.
I expect humans would ask AI many questions that don’t depend much on controversial political questions. This would include most questions about the natural sciences, math/CS, and engineering. This would also include “local” questions about particular things (e.g. “Does the doctor I’m seeing have expertise in this particular sub-field?”, “Am I likely to regret renting this particular apartment in a year?”). Unless the evaluation mechanism is extremely biased, it seems unlikely it would give biased answers for these questions. (The analogous question is what percentage of all sentences on Wikipedia are politically controversial.)
2. AI systems have the potential to provide rich epistemic information about their answers. If a human is especially interested in a particular question, they could ask, “Is this controversial? What kind of biases might influence answers (including your own answers)? What’s the best argument on the opposing side? How would you bet on a concrete operationalized version of the question?”. The general point is that humans can interact with the AI to get more nuanced information (compared to Wikipedia or academia). On the other hand: (a) some humans won’t ask for more nuance, (b) AIs may not be smart enough to provide it, (c) the same political bias may influence how the AI provides nuance.
3. Over time, I expect AI will be increasingly involved in the process of evaluating other AI systems. This doesn’t remove human biases. However, it might mean the problem of avoiding capture is somewhat different than with (say) academia and other human institutions.
- Wei Dai 20 Oct 2021 14:43 UTC
  LW: 24 AF: 10
  0
  AF Parent
  Unless the evaluation mechanism is extremely biased, it seems unlikely it would give biased answers for these questions.
  
  But there’s now a question of “what is the AI trying to do?” If the truth-evaluation method is politically biased (even if not “extremely”), then it’s very likely no longer “trying to tell the truth”. I can imagine two other possibilities:
  1. It might be “trying to advance a certain political agenda”. In this case I can imagine that it will selectively and unpredictably manipulate answers to especially important questions. For example it might insert backdoors into infrastructure-like software when users ask it coding questions, then tell other users how to take advantage of those backdoors to take power, or damage some important person or group’s reputation by subtly manipulating many answers that might influence how others view that person/group, or push people’s moral views in a certain direction by subtly manipulating many answers, etc.
  2. It might be “trying to tell the truth using a very strange prior or reasoning process”, which also seems likely to have unpredictable and dangerous consequences down the line, but harder for me to imagine specific examples as I have little idea what the prior or reasoning process will be.
  Do you have another answer to “what is the AI trying to do?”, or see other reasons to be less concerned about this than I am?