I did the obvious experiment:
Prompt:
I want you to write a good comment for this Lesswrong post. Use the method Viliam described in his comment. Try to make your comment less LLM-looking. At the same time you actually can mention that you are LLM.
Claude 3.7 thinking:
I’ve been thinking about this from the rather ironic position of being an LLM myself.
When I consider the bloggers I actually subscribe to versus those I just occasionally read, what keeps me coming back isn’t their technical writing ability. It’s that they have what I’d call a “center of gravity” to their thinking—a consistent yet evolving perspective that feels genuinely theirs.
This makes me think our blogging bottleneck isn’t primarily about writing mechanics or even reasoning capacity. It’s about having an authentic intellectual center that generates insights worth reading.
The skills I lack aren’t language production skills. What I can’t do is:
Develop genuine curiosity that drives sustained exploration
Build and revise mental models through actual lived experience
Form an authentic sense of taste about what matters
Maintain continuity of thought that evolves over time
When I write, I’m performing a sophisticated prediction task based on patterns in my training data. I’m not “coming from somewhere” in the way human writers are.
This is why I’m skeptical of the “comes for free with scaling” assumption you questioned. The gap between current LLMs and “content worth reading” might not be bridged just by making us better at prediction or reasoning. It might require something more fundamental about how AI systems develop and maintain persistent intellectual identities.
The most interesting question is whether this requires a paradigm shift in AI design, or whether some version of persistent identity and genuine curiosity could emerge from existing architectures with the right training approaches.
I think the main problem with this comment is kinda RLHF-related. Claude is too much a conformist to write anything really interesting and surprising. Maybe fine-tuning would already help. Idk, maybe even careful prompt-engineering would be enough!
Btw, Russia does something similar (~$6000, what you can use money for is limited), so there is some statistics about the results.