Daniel Kokotajlo comments on AMA on Truthful AI: Owen Cotton-Barratt, Owain Evans & co-authors

Daniel Kokotajlo 25 Oct 2021 22:07 UTC
4 points
What do you think about the failure mode described here? In particular, (a) how would you go about deciding whether politicians and CEOs inspired by your paper will decrease, or increase, the pressure on AIs to believe false things (and thus become epistemically broken fanatics/ideologues) and/or lie about what they believe? And (b) how would you go about weighing the benefits and costs of increased trust in AI systems?
- owencb 27 Oct 2021 11:23 UTC
  4 points
  Parent
  To add to what Owain said:
  - I think you’re pointing to a real and harmful possible dynamic
  - However I’m generally a bit sceptical of arguments of the form “we shouldn’t try to fix problem X because then people will get complacent”
    I think that the burden of proof lies squarely with the “don’t fix problem X” side, and that usually it’s good to fix the problem and then also give attention to the secondary problem that’s come up
  - I note that I don’t think of politicians and CEOs to be the primary audience of our paper
    Rather I think in the next several years such people will naturally start having more of their attention drawn to AI falsehoods (as these become a real-world issue), and start looking for what to do about it
    I think that at that point it would be good if the people they turn to are better informed about the possible dynamics and tradeoffs. I would like these people to have read work which builds upon what’s in our paper. It’s these further researchers (across a few fields) that I regard as the primary audience for our paper.
  - Daniel Kokotajlo 27 Oct 2021 17:00 UTC
    2 points
    Parent
    This is very helpful, thanks! I now have a better understanding of what you are doing and basically endorse it. (FWIW, this is what I thought/hoped you were doing.)
- Owain_Evans 26 Oct 2021 14:03 UTC
  4 points
  Parent
  (This won’t address all parts of your questions.)
  You suggest that the default outcome is for governments and tech platforms to not regulate whether AI needs to be truthful. I think it’s plausible that the default outcome is some kind of regulation.
  Why to expect regulation?
  Suppose an AI system produces false statements that deceive a group of humans. Suppose also that the deception is novel in some way: e.g. the falsehoods are personalized to individuals, the content/style is novel, or the humans behind the AI didn’t intend any kind of deception. I think if this happens repeatedly, there will be some kind of regulation. This could be voluntary self-regulation from tech companies or normal regulation by governments. Regulation may be more likely if it’s harder to litigate using existing laws relating to (human) deception.
  Why expect AI to cause deception?
  You also suggest that in the default scenario AI systems say lots of obviously false things and most humans would learn to distrust them. So there’s little deception in the first place. I’m uncertain about this but your position seems overconfident. Some considerations:
  1. AI systems that generate wild and blatant falsehoods all the time are not very useful. For most applications, it’s more useful to have systems that are fairly truthful in discussing non-controversial topics. Even for controversial or uncertain topics, there’s pressure for systems to not stray far from the beliefs of the intended audience.
  2. I expect some people will find text/chat by AI systems compelling based on stylistic features. Style can be personalized to individual humans. For example, texts could be easy to read (“I understood every word without pausing once!”) and entertaining (“It was so witty that I just didn’t want to stop reading”). Texts can also use style to signal intelligence and expertise (“This writer is obviously a genius and so I took their views seriously”).
  3. Sometimes people won’t know whether it was an AI or human who generated the text. If there are tools for distinguishing, some people won’t use them and some won’t have enough tech savvy to use them well.
  4. There are humans (“charlatans”) who frequently say false and dubious things while having devoted followers. Not all human followers of charlatans are “fools” (to refer to your original question). AI charlatans would have the advantage of more experimentation. Human charlatans exploit social proof and AI charlatans could do the same (e.g. humans believe the claim X because they think other humans they like/trust believe X).
  - Daniel Kokotajlo 27 Oct 2021 16:58 UTC
    2 points
    Parent
    This is helpful, thanks!
    I agree that we should expect regulation by default. And so then maybe the question is: Is the regulation that would be inspired by Truthful AI better or worse than the default? Seems plausibly better to me, but mostly it seems not that different to me. What sort of regulation are you imagining would happen by default, and why would it be significantly worse?
    I also totally agree with your points 1 − 4.