gwern comments on Policy for LLM Writing on LessWrong

gwern 15 Apr 2025 20:30 UTC
12 points
0
But the caveat there is that this is inherently a backwards-looking result:

We consider GPT-4o (OpenAI, 2024), Claude-3.5-Sonnet (Anthropic, 2024), Grok-2 (xAI, 2024), Gemini-1.5-Pro (Google, 2024), and DeepSeek-V3 (DeepSeek-AI, 2024).

So one way to put it would be that people & classifiers are good at detecting mid-2024-era chatbot prose. Unfortunately, somewhere after then, at least OpenAI and Google apparently began to target the problem of ChatGPTese (possibly for different reasons: Altman’s push into consumer companion-bots/personalization/social-networking, and Google just mostly ignoring RLHF in favor of capabilities), and the chatbot style seems to have improved substantially. Even the current GPT-4o doesn’t sound nearly as 4o-like as it did just back in November 2024. Since mode-collapse/ChatGPTese stuff was never a capabilities problem per se (just look at GPT-3!), but mostly just neglect/apathy on part of the foundation labs (as I’ve been pointing out since the beginning), it’s not a surprise that it could improve rapidly once they put (possibly literally) any effort into fixing it.

Between the continued rapid increase in capabilities and paying some attention to esthetics & prose style and attackers slowly improving their infrastructure in the obvious ways, I expect over the course of 2025 that detecting prose from a SOTA model is going to get much more difficult. (And this excludes the cumulative effect on humans increasingly writing like ChatGPT.)

EDIT: today on HN, a post was on the front page for several hours with +70 upvotes, despite being blatantly new-4o-written (and impressively vapid). Is this the highest-upvoted LLM text on HN to date? I suspect that if it is, we’ll soon see higher...
- habryka 15 Apr 2025 20:57 UTC
  2 points
  0
  Parent
  It already has been getting a bunch harder. I am quite confident a lot of new submissions to LW are AI-generated, but the last month or two have made distinguishing them from human writing a lot harder. I still think we are pretty good, but I don’t think we are that many months away from that breaking as well.
  - kave 15 Apr 2025 21:11 UTC
    2 points
    0
    Parent
    In particular, it’s hard to distinguish in the amount of time that I have to moderate a new user submission. Given that I’m trying to spend a few minutes on a new user, it’s very helpful to be able to rely on style cues.