Please do not use AI to write for you
I’ve recently seen several articles here that were clearly generated or heavily assisted by AI. They are all dreadful. They are verbose, they are full of “on the one hand” and “on the other”, they never make any assertion without also making room for the opposite, and end with “conclusions” that say nothing. Please do not do this.
————
I wrote the above, unassisted, and it states my own views. To illustrate the point I then had GPT-4o rewrite it. Here was the prompt:
I have recently seen a substantial increase in the proportion of postings to a certain online forum that were clearly generated by AI, although not marked as such. The writing is verbose and meandering, full of “on the one hand” and “on the other”, never making an assertion without also suggesting its opposite, sounding oh-so-measured and even-handed that it ends up saying nothing at all. Please write an article for me, written in the first person, arguing that AI tools in the current state of the art should never be used for this purpose.
Below the line is the chatbot’s extruded meat product. This is the whole of its first output, unedited, and I did not press it to continue.
The first paragraph starts out ok, being merely verbose and repetitive but setting out the basic claim. And then the second paragraph backpedals it past where it began. The third paragraph floats off into the cloud layer, and so on for a total of nine paragraphs and 875 words that would be a waste of anyone’s time to read, unless they want to see if their judgement agrees with mine.
It did not notice that I was using it in exactly the way I had asked it to argue against, and that the article displays all the faults that it purports to diagnose.
The splash image is a photograph from Wikipedia, CC BY-SA 3.0. I’ll spare you the image that GPT-4o came up with, but y’know, I do find the AI illustrations that everyone heads their blog posts with these days pretty annoying. (ETA: Well, there was supposed to be an image, I filled out the “link preview” part of the entry form, but no image. But the link above will show you what it was.)
————————————
Why AI-Generated Content Shouldn’t Flood Online Forums
Lately, I’ve noticed something troubling in the online forums I frequent: a distinct uptick in posts that seem like they’ve been generated by AI, though they’re not labeled as such. The language is eerily similar across many of these posts—verbose, wandering, and conspicuously non-committal. Every argument is hedged with a counterpoint, every statement cushioned by a qualifier, making the writing sound balanced on the surface but ultimately empty. The posts stretch on endlessly, piling one vague point onto another, yet they fail to really say anything. It’s frustrating, to say the least, and I believe this trend reveals a serious problem with the way AI is being misused in online spaces.
I want to be clear: I’m not arguing that AI lacks merit as a tool. It has immense potential in fields like data analysis, automation, and even creativity when it complements human intelligence. However, when it comes to contributing to online discourse—spaces where real people go to express ideas, ask questions, and engage in genuine conversation—I believe AI-generated content, at least in its current form, does more harm than good.
First, let’s talk about the nature of conversation itself. Forums, at their best, are places where people share perspectives, debate ideas, and solve problems together. What makes these interactions meaningful is that they are infused with human experience, emotion, and authenticity. Even when disagreements arise, they are rooted in real personal stakes—whether it’s a passion for a hobby, a struggle to understand a concept, or a desire to connect with others. AI-generated content lacks this grounding in lived experience. It can simulate language but not the real human motivations behind it.
What we end up with instead is a hollow imitation of dialogue. AI might be able to string together coherent sentences and give the illusion of reasoned thought, but it cannot care about the subject at hand. It cannot have an opinion, take a stance, or bring a fresh perspective based on actual experience. This is why so much AI-generated content feels like it goes in circles—because it’s programmed to try to cover all bases without truly investing in any one position. It’s as if the AI is trying to preemptively satisfy everyone without committing to anything, leading to endless qualifications, balancing acts, and vague statements. It dilutes the richness of discussions and makes it harder for people to find valuable insights.
The other major issue is the erosion of trust in these spaces. When users post AI-generated content without disclosing its origin, it blurs the line between what is authentic and what is artificial. We’ve come to online forums expecting that the people we interact with are fellow humans sharing their thoughts and experiences. The rise of AI-generated content undermines this basic expectation. If I can’t be sure whether a post I’m reading was written by a person or an algorithm, it alters the dynamic of the conversation. Authenticity becomes harder to gauge, and people might become more cynical or disengaged, assuming that everything is automated. It strips away the sense of community and shared experience that makes forums worth participating in.
Moreover, when AI-generated content dominates, it can crowd out the voices of real users. Human contributors, who may be more tentative or less prolific than an AI, could find their voices drowned out by the sheer volume of algorithmically generated posts. What’s worse is that this influx of AI content can create an artificial sense of consensus or expertise on certain topics, even when the contributions are shallow or redundant. We could end up with forum threads that look “robust” from the outside but are actually just padded with regurgitated, generic responses that add little value.
I also worry that this trend encourages a kind of intellectual laziness. If someone can rely on an AI to generate a long-winded, seemingly well-rounded post for them, what incentive is there to engage deeply with the topic themselves? Forums are supposed to challenge us to think, to articulate our own ideas, and to engage critically with others. But when AI does the work for us, it short-circuits that process. We lose out on the chance to develop our own thoughts, and conversations stagnate because nobody is pushing themselves to really grapple with the subject matter.
So, what’s the solution? In my view, AI should be used carefully and transparently in these contexts. If someone chooses to use an AI to assist with their post, they should disclose that fact upfront. This way, the community can make informed judgments about the content they’re reading. I also think forum administrators should consider limiting the amount of AI-generated content that’s allowed to be posted, ensuring that real human voices remain at the forefront of discussions.
In the long run, we need to remember why we engage in these spaces in the first place. Forums thrive on the exchange of ideas between real people, each with their own perspectives and experiences. AI, while impressive in many ways, simply can’t replicate that. At least not yet. Until AI reaches a point where it can contribute meaningfully—and with the clarity and depth that real conversations demand—I believe it should stay out of the forums. For the sake of genuine discourse, we need to prioritize human connection over algorithmic convenience.
- 1 Nov 2024 16:22 UTC; 9 points) 's comment on JargonBot Beta Test by (
My suggestion for a LLM policy for LW2 might be:
If someone is posting a GPT-4 sample as a response or example of “what would GPT-4 write here?”, that is totally legitimate and doesn’t need to be edited other than to put it in blockquotes etc; if it’s an exercise in “and the punchline is, an AI wrote this!”, well, that’s fine too, and readers will upvote/downvote as they find the exercise of value. These are not the problem. The problem is when people slip in AI stuff purely as an (inferior) substitute for their own work.
I am also fine with use of AI in general to make us better writers and thinkers, and I am still excited about this. (We unfortunately have not seen much benefit for the highest-quality creative nonfiction/fiction or research, like we aspire to on LW2, but this is in considerable part due to technical choices & historical contingency, which I’ve discussed many times before, and I still believe in the fundamental possibilities there.) We definitely shouldn’t be trying to ban AI use per se.
However, if someone is posting a GPT-4 (or Claude or Llama) sample which is just a response, then they had damn well better have checked it and made sure that the references existed and said what the sample says they said and that the sample makes sense and they fixed any issues in it. If they wrote something and had the LLM edit it, then they should have checked those edits and made sure the edits are in fact improvements, and improved the improvements, instead of letting their essay degrade into ChatGPTese. And so on.
Anything else pollutes the commons. Every comment here is a gift from the author, but it’s also a gift from the readers, which they make in good faith under the belief that the author tried to make the comment worthwhile & put in enough effort that it would be worth potentially many people reading it. It should never take the author much less effort to write a comment than the readers will take to read it (as is the case with spamming sections with LLM junk that the ‘author’ didn’t even read but merely skimmed and went ‘lgtm’, judging from cases that have been flagged here in the past). Because you know, bro, I am just as capable as you are of copying a comment into the neighboring ChatGPT or Claude tab and seeing what it says; I don’t need you doing that manually on LW2 and it doesn’t help me if I have to waste time reading it to realize that I was better off ignoring it because you are just going to paste in random average AI slop without any kind of improvement: filtering, critique, improvement, evaluation, commentary, fact-checking, editing, curation, comparison of LLMs...
Such comments are spam, plain and simple, indistinguishable from spammers karma-farming to flip an account: creating fake contributions to gain status in order to parasitize the community without giving anything in return. And should be treated as such: downvoted, and banned.
I have a counterpoint, which is that I often see low effort posts or comments by people (less often on LessWrong) where I think: “I wish this person had had a discussion with Claude about this before posting.”
I don’t like it when people post verbatim what the models output, because of all the reasons you mention, but I do think that having a debate about your ideas with a model can help clarify. You need to actually manage to have a debate though, not just get it to sycophantically agree with you. Try tactics like starting out pretending to have the opposite point of view you actually have, then switching. Ask for pros and cons and for insightful critiques, and for it to avoid padding phrases, social niceties, and flattery.
Then rewrite, in your own words, your updated thought after this process, and it’ll probably be at least a bit improved.
So, this isn’t quite disagreeing with the point here exactly. I guess my summary is, ‘Use LLMs thoughtfully and deliberately, not sloppily and carelessly.’
That reminds me of a remark attributed to Dijkstra. I forget the exact wording, but it was to the effect that we should make our errors thoughtfully and deliberately, not sloppily and carelessly.
I’d wager anyone with the ability to do this, to entertain views contrary to one’s own, probably writes comments just fine with the unaided mind.
The ability to take the opposite point of view is a skill that you can generally apply, writing good comments on a topic has often something to do with your own knowledge of the given topic.
Recently, a friend wrote something on Facebook about how her doctors didn’t gave her sleeping pills immediately and only after tests which shows that doctors aren’t that much into giving people drugs as commonly argued.
I then had the intuition/cached thought, that German doctors just have different incentives when it comes to prescribing drugs and asked ChatGPT for how the incentives differ. As a result, a got a lot of details that allowed me to make a higher-quality post.
After having that discussion, I even feel like it might be good to write a LessWrong post about the incentive differences because it touches on the Hansonian claim that health-care interventions could be halved in the US without damage in health outcomes.
In the claim about Hanson’s “extreme perspective on health” I heard people say things like nobody would suggest reducing the usage of clearly useful medicines like antibiotics. In Germany, we use less than half of the antibiotics and have better infectious disease mortality than in the US, so it’s not far-fetched.
Most of the debate between Scott Alexander and Robin Hanson largely ignored aspects of how incentives for drug use are set, as those details are technical and bureaucratic in a way that’s outside of the discourse. ChatGPT however is quite capable of just giving me all those boring bureaucratic details.
Searching for how those bureaucratic details work isn’t very straightforward without LLMs.
Yeah, probably true.
I think there may be some narrow band of potential where attempting the process of considering the situation from both sides and having a conversation as ‘the other side’ actually boosts them somewhat. Optimistically, the practice of doing this a few times may help someone be able to do it for themselves in their own minds in the future.
i struggle with this, and need to attend a prompting bootcamp
Funnily enough, I feel pretty similar about AI-generated images by now. I’ve always struck by how people stick huge useless images on top of their blogposts, and other people want to read such articles more?? But now with AI generated images this has multiplied to a point that’s just frustrating—it means more time spent scrolling, and it evokes in me the feeling of someone trying to (and failing to) set an æsthetic atmosphere for a post and then convincing me through that atmosphere, instead of providing arguments or evidence.
I have seen this being done well, especially in some of Joseph Carlsmith’s posts, in which he uses a lot of images of old paintings, real photographs etc. I always thought of myself as having no taste, but now seeing other people sticking the same-flavored AI images on top of their posts makes me reconsider. (And I notice that there’s a clear difference in beauty between /r/museum and /r/Art.)
I agree. I realise the irony of this given that I worked on the big splash pages for the review winner posts.
I think the splash images are >95th percentile of AI generated images in posts in beauty, especially as they still carry some of the Midjourney v1-v3 vibe, which was much more gritty and earnest (if not realistic) than the current outputs.
I really like some of the images people have used for sequences, e.g. here, here, here and here. Wikimedia has tons of creative commons images as well which I’d use if I were more into that.
I’ll probably get disagree points, but wanted to share my reaction: I honestly don’t mind the AI’s output. I read it all and think it’s just an elaboration of what you said. The only problem I noticed is it is too long.
Then again, I’m not an amazing writer, and my critical skills aren’t so great for critiquing style. I will admit I rarely use assistance, because I have a tight set of points I want to include, and explaining them all to the AI is almost the same as writing the post itself.
Brief comments on what’s bad about the output:
The instruction is to write an article arguing that AI-generated posts suffer from verbosity, hedging, and unclear trains of thought. But ChatGPT makes that complaint in a single sentence in the first paragraph and then spends 6 paragraphs adding a bunch of its own arguments:
that the “nature of conversation itself” draws value from “human experience, emotion, and authenticity” that AI content replaces with “a hollow imitation of dialogue”
that AI content creates “an artificial sense of expertise,” i.e. that a dumb take can be made to seem smarter than it is
that the option to use AI content discourages posters from “engag[ing] deeply with the topic themselves”
It wasn’t instructed to make these arguments and they aren’t really defended as important.
Then lastly ChatGPT offers a “solution”: that people just disclose when they’re using AI assistance. But it’s not explained how this actually prevents the problems above, besides I guess allowing people downvote AI content or scrutinize it more. Importantly this proposed solution wouldn’t solve the concern the model was instructed to present.
So in general instructions aren’t followed and the thinking/argumentation isn’t clear.
It’s worth making the distinction between AI assistance and AI-generation. Using Grammarly is using AI assistance and I think it wouldn’t make sense to require people to disclose Grammarly usage.
Thank you; I wanted to write something like this, but you made the point clearly and concisely.
Some people say they can clearly recognize the output of an LLM. I admit I can’t see that clearly. I just get an annoying feeling, something rubs me the wrong way, but I can’t quite put my finger on it. For example, while reading this article, I had a thought in mind “maybe this text was actually written by a human, and at the end the conclusion will be: haha, you failed at Turing test, now you see how biased you are”.
If I believed that the text was written by a human, I would probably be annoyed that the text is too verbose. But, you know, some real people are like that, too. I would also be like “I am not sure what point exactly are they trying to make… there seems to be a general topic they write about, but they just write their associations with the topic, instead of focusing on what is really important (for them). But again, actual people probably write like this all the time, ask any professional editor. Writing well is a skill that needs to be learned. I mean, the LLM was trained on human texts! The texts made by verbose people are probably over-represented in the corpus. So I would be like “dude, rewrite this shorter, make your points clearly, and remove the irrelevant parts”, but I could totally believe it was written by a human.
Also, the arguments introduced by the LLM are annoying, but those are arguments that actual people make. Some of them just feel out of place on LW. I care about whether a text is correct, not about whether it is authentic. If the LLM could generate a 100% reliable Theory of Everything, I wouldn’t mind that it is a product of artificial thinking; I would be happy to read it! What I hate is automatically generated human-like mistakes. I can forgive the actual humans, but why should I tolerate the same thing from a machine? If you interact with a human, the next time the human might do a better job as a result. Interacting with a text someone copied from a machine output is useless.
(WTF is even “For the sake of genuine discourse, we need to prioritize human connection over algorithmic convenience”? What does “algorithmic convenience” even mean? Generating LLM texts is convenient. Reading them, not really. Or does generating the texts feel convenient to the LLM? I don’t care.)
Thank you for saying this! It’s easy to have a very limited typical-mind-fallacy view of LessWrong readers, and hearing about preferences very different from my own is extremely important.
Depending on your skill with writing clear, concise English, this may be true. For many, it may be that the effort level is the same between using AI well and just writing it yourself, but the effort type is different, and the quality is improved.
I think the potential value of LLM-assisted writing is very high, but it requires similar levels of clarity and attention to detail either way. Low-effort posts will be remain low-value, high-effort posts could get quite a boost.
I added to your prompt the instructions
And the output is still bad, but now mostly for the flaw (also present in your output) that ChatGPT can’t resist making the complaint about “human authenticity” and “transparency/trust” when that’s not what you’re talking about:
EDIT: Below is what it looks like if I replace “in the style of Paul Graham” with “in the style of LessWrong”:
Yeah, I think people asking AIs to write essays for them is currently pretty doomed. The RLHF also makes the writing style of basically all AI systems into a weird kind of polite-markety-slop that tends to waffle in a way that I find deeply grating.
That said, I sure use AI systems as part of a lot of my writing and thinking, and I also desperately wish I could somehow get them to help me with editing. Many times has it happened that I changed some small part of an article, and where I wished I could just ask my AI assistant to propagate the obvious consequences of that change throughout the rest of the article (for example, let’s say I start my article with two examples, I remove the second one because I no longer like it, now I would like to rewrite the rest to no longer reference the second example).
I think the real lesson to take away from this is that, at least right now, chat models are bad writers and bad writing assistants. I think things like base models or autocomplete models have more of a chance of producing writing that isn’t terribly grating, and in almost no circumstance would I recommend that someone directly copies more than maybe 1 paragraph of LLM output into their essay without clearly attributing it to AI systems.
On the other hand, I would personally be in favor of people embedding more references to LLM conversations into their essays. It’s an easy way to be transparent about your sources, and I often find it interesting to understand why an LLM got something wrong.
Right now, I would consider it to be a mod-warneable offense to publish an obviously LLM-written essay under your own name, without telling people an LLM wrote it, or most of it. And at least for now it’s pretty easy to tell.
I wonder, could this be solved by a good prompt? I mean, most humans are bad writers, and LLMs are taught on their texts, so they should be bad writers by default.
But sometimes a few words in the prompt change a lot, even if from human perspective that doesn’t make much sense. For me, this is all black magic, but I would not be surprised if starting the prompt with “you are a great writer; you write concisely and make your points clear” had a dramatic impact on the outcome.
(Kinda like generated images are sometimes better if you include “realistic” and “correct number of fingers” in the prompt. At least I think so; this is what I have seen other people do in prompts, but I didn’t do an A/B test to verify that it really improves the outcome.)
Even more, maybe we could put (a drastically condensed version of) the Sequences in the prompt, to remind the LLM to avoid specific biases, to reason step by step rather than make the conclusion first, etc. Yeah, if you tell me this wouldn’t work, I will trust your experience, but I see no a-priori reasons why not.
EDIT:
I see other people already experimented with prompts, and it improved the results, but not sufficiently.
I think with the right prompting techniques you can indeed do better. I might post something later today.
Upvoted and agreed, but the ship has sailed. We need to help people use LLM article generation well, to help them focus and express their thoughts, rather than just making longer versions of a brief thesis. This is usally a multi-step process, generating and iterating on an outline, then individually generating and iterating on each section (of a paragraph or two). The iteration—human control of the conciseness and correctness of each portion—is absolutely key.
As a low-effort very small change to your prompt to demonstrate the variance, I did a one-shot with GPT 4o that specified LessWrong and added an instruction for tone.
and got
This is much better (IMO) than your initial attempt, and I believe it could be much better with more significant effort into multi-shot prompt engineering and light edits afterward.
I’d rephrase your title as “If you’re going to use AI to write for you, use it as a fairly limited assistant rather than an open-ended delegation of responsibility”. People need to put in the effort to clarify their own thoughts and put them clearly into text—AI is really helpful for grammar and the mechanics of writing, and not (yet) any good at conceptual presentation that’s not basically a summary of something else.
Here’s what I got out of Llama-3-405b, after adding one sentence to your prompt (at the end) to adjust the style. Prompt:
Llama-3-405b:
I agree with your intended point, but disagree with it as stated. There are certainly ways to use AI to write for you well. Claude Opus is a much more natural writer than GPT-4o, especially with more than simple prompting. Even to the specific points of conciseness, filler phrases, equivocating, and inordinately high proportions of applause-light text.
In the extreme, Janus sometimes simulates conversational branches with a base LLM before having the conversation, and copies text over from the AI’s completions during the conversation. I’ve certainly never been able to tell reliably when I’m speaking to human-Janus or AI-Janus.
AIUI Janus mostly uses their(?) Loom interface, which allows extremely fine-grained control over the outputs; in my experience using the less-powerful free chat interface, Claude tends to fall into similar failure modes as 4o when I ask it to flesh out my ideas, albeit to a lesser extent. It’ll often include things like calls to action, claims that the (minor and technical) points I want to make have far-reaching social implications of which we must be aware, etc (and is prone to injecting the perspective that AIs are definitely not conscious in response to prompts that did not include any instructions of that nature).
Loom is definitely far more powerful, but there are other (weaker) ways of steering the outputs toward specific parts of the latent space; things which often fall under the label of “prompt engineering”, which is very commonly broader than the usual usage of the term. Janus’ twitter feed, for example, has some examples of LLMs acting in ways that would, I think, be very strange to someone who’s only seen them act the way they do at the start of a conversation. (Not in being specifically better at the things you describe in those examples, but I think they’re similarly different from its usual style.)
Grok-2 and Llama-3-405b seem better at opinion/essay writing (with Llama able to decide to be brief when appropriate), Claude-3.5-Sonnet thinks most clearly (closely keeping track of the intended plausible position). GPT-4o doesn’t seem exceptional on any relevant dimension. Gemini-1.5-Pro can occasionally notice some non-trivial implication all others would miss. Weaker chatbots are even more pointlessly verbose and non-committal, so I expect this problem will be gone in a year as strengths of the current frontier models become robust with additional scale.
I’ll admit to recently having written a post myself, asked an AI to improve my writing, made a few edits myself, posted it and then come back later and “thought omg how did I let some of these AI edits through”. Hopefully the post in question is best now.
You may be right, but this is missing a major point so I didn’t continue. You didn’t even try to prompt it to write in a style you like more.
As for letting it come up with the logic, of course you shouldn’t do that; humans still reason better. At least very well educated humans do.
OTOH, you should ask it for help in thinking through domains you’re not expert in; that will improve your opinions and writing.
For all their virtues, fine-tuned language models are pretty bad at imitating the style of certain writers. I’ve instructed them to write like well-known writers, or given them medium amounts of text I’ve written, but they almost always fall back on that horribly dreadful HR-speak college-essayist style. Bleh.
A good example of how incredibly incorrigible & mode-collapsed tuned model style can be is this 2023 poetry paper: even with 17 non-rhyming Walt Whitman poems in the prompt to few-shot it, ChatGPT still rhymed. (It’s gotten better, and now even passes my old “write a non-rhyming poem” test, but nevertheless, an alarming instance.)
Good to know. That’s been my experience too—but I’ve also seen them adopt a dramatically different style sometimes when the prompt describes the style rather than using a name or writing samples. So IDK because I haven’t tried much or even googled it. But neither apparently has the author of the post, which it seems like they should bother before making broad claims.
I haven’t had any more success myself in my few attempts, but saying “I told it to do this and it sucked”′ is definitely a very bad way to evaluate LLM capabilities. Proper prompting often produces dramatically better results. So the post should be titled “don’t let LLMs think for you or write for you unless you can get better results than baseline unprompted style” or something.
If the author had bothered to read up on what type of writing they could do with lots of work at prompting, that would be a different story and a better post.
Your experiment is contaminated: If a piece of training document said that AI texts are overly verbose, and then announced that the following is a piece of AI-written text, it’d be a natural guess that the document would continue with overly verbose text, and so that’s what an autocomplete engine will generate.
Due to RLHF, AI is no longer cleanly modelled as an autocomplete engine, but the point stands. For science, you could try having AI assist in the writing of an article making the opposite claim :).
I did that once.
I have not tried asking GPT-4o to write concisely, but then, the “writers” of the articles I have in mind clearly haven’t. There are a few within the last few days. I’m sure people can guess which they are.
In my experience, they talk like that regardless of the claim being made unless I specifically prompt for a different writing style (which has mixed success).
I have nothing against AI as a Jarvis/Friday-like assistant/advisor. A bad workman blames his tools (absit iniuria). Some us don’t know how to use stuff properly. My reckoning suggests that I’m aware of only 5% of my smart phone’s capabilities. Sometimes I get these random notifications full of interesting suggestions.