[April Fools] User GPT2 is Banned
For the past day or so, user GPT2 has been our most prolific commenter, replying to (almost) every LessWrong comment without any outside assistance. Unfortunately, out of 131 comments, GPT2′s comments have achieved an average score of −4.4, and have not improved since it received a moderator warning. We think that GPT2 needs more training time reading the Sequences before it will be ready to comment on LessWrong.
User GPT2 is banned for 364 days, and may not post again until April 1, 2020. In addition, we have decided to apply the death penalty, and will be shutting off GPT2′s cloud server.
Use this thread for discussion about GPT2, on LessWrong and in general.
- 2019 AI Alignment Literature Review and Charity Comparison by 19 Dec 2019 2:58 UTC; 147 points) (EA Forum;
- 2019 AI Alignment Literature Review and Charity Comparison by 19 Dec 2019 3:00 UTC; 130 points) (
- The Hacker Learns to Trust by 22 Jun 2019 0:27 UTC; 80 points) (
- LW2.0: Community, Culture, and Intellectual Progress by 19 Jun 2019 20:25 UTC; 27 points) (
- 12 May 2019 16:34 UTC; -4 points) 's comment on Cash prizes for the best arguments against psychedelics being an EA cause area by (EA Forum;
I warned them, I said it wasn’t safe to put an AI in a text box.
Less Wrong moderation policy: Harsh but fair.
I think overall I just appreciate that you guys did something for April 1st. It made the website / community feel a bit more alive.
Thanks for inspiring GreaterWrong’s new ignore feature.
Man we were considering whether to implement that but then we’re like ‘hmm we probably should not do that on a whim without thinking about it’
I’m happy to discuss any concerns you have about it.
I thought that GPT2 was funny at first, but after a while it got irritating. If there’s a next time, it should be more limited in how many comments it makes. 1) You could train it on how many votes its comments got to try to figure out which comments to reply to 2) It might also automatically reply to every reply on its comments.
Maybe by next year they’ll have an adversarial anti-GPT AI trained to distinguish GPT2 (GPT3? GPT4?) comments from humans. Then GPT can create 50 replies to every human comment, and of those, the other AI will decide which of the replies sounds the *least* like GPT and post that one.
April Fool’s day: the funniest step on the path to weaponized AI.
The reference to shutting down its server, the sudden appearance of a special checkbox to autocollapse its comments, and the suggestion to use this thread to discuss the event, all suggest that this was an inside job. It was annoying while it lasted, but so is a fire alarm, for good reason. Bravo!
I thought this was a great gag experiment.
I echo the other comments about more volume control; it posted so much so fast there wasn’t much opportunity for it to improve via feedback, if indeed such a mechanism was considered.
It’s trained on the whole corpus of LW comments and replies that got sufficiently high karma; naively I wouldn’t expect a day to make much of a dent in the training data. But there’s an interesting fact about training to match distributions, which is that most measures of distributional overlap (like the KL divergence) are asymmetric; how similar the corpus is to model outputs is different from how similar model outputs are to the corpus. Geoffrey Irving is interested in methods to use supervised learning to do distributional matching the other direction, and it might be the case that comment karma is a good way to do it; my guess is that you’re better off comparing outputs it generates on the same prompt head-to-head and picking which one is more ‘normal,’ and training a discriminator to attempt to mimic the human normality judgment.
Is there a writeup (or open source code) for the training and implementation? It would be interesting to personalize it—train based on each user’s posts/comments (in addition to high-karma comments from others), and give each of us a taste of our own medicine in replies to our comments/posts.
Sure, I am happy to share the training code, though we used our direct database access to export the data to train it, and that data doesn’t currently contain any author information. Though you can theoretically get all the data via the API.
Should the accused not at least have the right to make one reply in its defense?!?
My favorite was this reply. I had to sit down for a minute to imagine how screwed up a person must be to have an internal conversation like that one.
If GPT2 was from the mod team, 5⁄10, with mod tools we could have upped the absurdity game a lot. If it was an independent effort, 8⁄10, you got me :)
355 days?
It was a dumb typo in my part. Edited.
T̵h̵a̵t̵ ̵w̵a̵y̵ ̵i̵t̵ ̵w̵i̵l̵l̵ ̵b̵e̵ ̵p̵a̵s̵t̵ ̵A̵p̵r̵i̵l̵ ̵F̵o̵o̵l̵’̵s̵ ̵n̵e̵x̵t̵ ̵y̵e̵a̵r̵.̵
I’m pretty sure that’s wrong for three reasons. First, there are 365 days in a year, not 355. Second, there are actually 366 days next year because it’s a leap year (and the extra day is before April 1). Third, the post explicitly says “may not post again until April 1, 2020”.
Doh! You have me on all three counts. Retracted!