Meta wants to use AI to write Wikipedia articles; I am Nervous™

Yitz30 Mar 2022 19:05 UTC

14 points

Meta is experimenting with using AI to write Wikipedia articles: https://ai.facebook.com/research/publications/generating-full-length-wikipedia-biographies-the-impact-of-gender-bias-on-the-retrieval-based-generation-of-women-biographies

I personally have a very bad feeling about this. I’m most afraid of this making it easier to spam the encyclopedia with fake information that looks plausible on the surface, and therefore doesn’t get fully fact-checked. This could also create perverse incentives where SEO companies put out false information online to bias Meta’s algorithm, and thereby sneak their way in to the encyclopedia. The decision to make this open source seems incredibly foolish to me as well, considering how easily a service like this could be misused. (Edit: It has been pointed out to me that if they made it closed-source that wouldn’t be great either, since then we would have no idea what it was doing under the hood. Either way I wouldn’t be happy, so I’m not sure their choice counts as a point against them.)

Am I overreacting? Is this actually a good thing? Is this actually way worse than I think it is? Who knows!

What are your thoughts on this?

What links here?

David Gross's comment on In the very near future the internet will answer all of our questions and that makes me sad by David Gross (31 Mar 2022 2:23 UTC; 1 point)

Yitz30 Mar 2022 19:05 UTC

14 points

12 comments1 min readLW link

World Optimization AI

Dave Orr 30 Mar 2022 19:32 UTC
8 points
FAIR publishing some research into long form text generation is basically unrelated to someone generating wikipedia articles and automatically uploading them. Researchers love using wikipedia in various ways because it’s free and pretty high quality, and there’s a lot of it. So tons and tons of publications do various things to and with wikipedia data.
Yes maybe someone can download their code and do something nefarious, but I doubt it’s any more useful for that sort of thing than other long form text generation approaches like GPT3.
- Yitz 30 Mar 2022 19:42 UTC
  1 point
  Parent
  thanks for the reassuring context :)
Ollie J 31 Mar 2022 8:46 UTC
6 points
I’m positive that as these language models become more accessible and powerful, their misuse will grow massively. However, I believe open sourcing is the best option here; having access to such model allows us to create accurate automatic classifiers that detect outputs from such models. Media websites (e.g. Wikipedia, Twitter) could include this classifier in their pipeline for submitting new media.
Making such technologies closed source leaves researchers in the dark; due to the scaling-transformer hype, only a tiny fraction of the world’s population have the financial means to train a SOTA transformer model.
- Yitz 31 Mar 2022 12:00 UTC
  1 point
  Parent
  After some consideration, I agree with you. Still can’t say I’m happy about it, but it’s a better option than closed source, for sure.
Anirandis 30 Mar 2022 21:04 UTC
4 points
Presumably it’d take less manpower to review each article that the AI’s written (i.e. read the citations & make sure the article accurately describes the subjects) than it would to write articles from scratch. I’d guess this is the case even if the claims seem plausible & fact-checking requires a somewhat detailed reading through of the sources.
- Yitz 30 Mar 2022 21:51 UTC
  2 points
  Parent
  That would be a much more boring task for most people than direct writing, and would attract fewer volunteers, I’d have to imagine
Dagon 30 Mar 2022 20:37 UTC
3 points
I think on balance this is a good thing. More, better, fairer information on wikipedia is awesome.
I do worry about different standards of editing for machine- and human-generated contributions, and I foresee some pain in edit wars and reversions until processes and norms evolve to handle mixed-source topics.
- Viliam 31 Mar 2022 15:29 UTC
  6 points
  Parent
  When the AI is capable of edit wars and defending its actions on the talk page, then Wikipedia will be truly doomed.
  - Dagon 31 Mar 2022 17:25 UTC
    6 points
    Parent
    I was more thinking about the humans running the AI, not the AI itself having an advantage in the edit wars. If the project gets special privileges and bypasses the normal (and sometimes painful) oversight by human volunteers, it can end up putting incorrect or low-value information in that’s easier to create than to improve.
    Agreed, if the AI is passing the “wikipedia contributor” turing test, then it’s all over anyway.
    - Yitz 31 Mar 2022 18:52 UTC
      1 point
      Parent
      if the AI is passing the “wikipedia contributor” Turing test, then it’s all over anyway.
      This is a very strong statement! Would you be willing to make a specific prediction conditional on an AI passing the ‘Wikipedia contributor’ Turing test? (something like “if that happens, I predict x will happen within [y unit of time] with z probability” or something of the sort)
      Not that there’ll necessarily be anyone around to register it if you’re correct, but still...
      - Dagon 31 Mar 2022 19:42 UTC
        7 points
        Parent
        Actually, I’ll instead back off on my statement. Having seen some of the low-quality discussions in edit wars, it’s not actually a very high bar.
        Yitz 31 Mar 2022 21:24 UTC
        1 point
        Parent
        lol I feel you on that one! 🙃