Damn, I hate to think about this, but it is unavoidable anyway, isn’t it?
From my perspective, LLM generated texts vs human generated texts are kinda like spam e-mails vs letters written on paper. It was possible to get a stupid and annoying letter on paper, but it usually happened rarely, because there was a time cost to writing them, and writing a letter takes more time than reading it. With spam e-mails, it is the other way round, sending the message takes a few minutes, and the collective time spent reading it (even if most people immediately decide to delete it) could be hours or more.
Emotionally, I would hate to spend my time reading an article, and then decide that I either don’t like it or feel ambivalent about it, and maybe write a comment that tries to be helpful and explain why… and then find out that it took me 15 minutes to read the article, think about it, and write a response, but it only took the author 1 minute to generate it and post it.
But what other options are there? Should I stop reading insufficiently interesting articles after the first half page, and just downvote them and close without commenting? Yeah, that would be better for me, but it would make LW a worse experience for the new authors.
If we could trust that people honestly disclose in a standardized way at the beginning of the article (or even better, click a checkbox when posting it, and an icon would appear next to the title) that the article was written by an LLM, then I would be harsh to the LLM posts and lenient to the new authors. But this creates an obvious incentive to lie about using LLMs. (Possible solution: great punishment for authors who use an LLM and fail to disclose it. But also: false positives.)
I like habryka’s proposed solution, to integrate the LLMs with LW and publish the prompts. Preferably at the top of the page, so that I can read the prompt first and decide whether I want to read the article at all. For example, if the prompt already specifies the bottom line, I probably won’t pay much attention to the arguments LLM makes, because I know it is not even trying to paint an objective picture.
I can imagine legitimate uses of LLM, for example if someone describes something in abstract, and then uses an LLM to suggest specific examples (and then carefully reviews those examples). Articles with examples are usually better than abstract arguments, and sometimes the examples just don’t come to my mind, or they are all similar to each other and the LLM might notice something different.
Maybe the parts written by LLM should be displayed using a different font (e.g. monospace)?
I feel like more and more LLM use is inevitable, and at least some of it is no different from the age-old (as far as “old” applies to LessWrong and online forums) problem of filtering new users who are more enthusiastic than organized, and generate a lot of spew before really slowing down and writing FOR THIS AUDIENCE (or getting mad and going away, in the case of bad fit). LLMs make it easy to increase that volume of low-value stuff.
I really want to enable the high-value uses of LLM, because I want more from many posts and I think LLMs CAN BE a good writing partner. My mental model is that binary approaches (“identify LLM-generated content”) are going to fail, because the incentives are wrong, but also because it discourages the good uses.
The problem of voluminous bad posts has two main tactics for us to use against it. 1) identification and filtering (downvoting and admin intervention). This works today (though it’s time-consuming and uncomfortable), and is likely to continue to work for quite some time. I haven’t really played with LLM as evaluation/summary for things I read and comment on, but I think I’m going to try it. I wonder if I can get GPT to estimate how much effort went into a post...
2) assistance and encouragement of posters to think and write for this audience, and to engage with comments (which are sometimes very direct). This COULD include recommendations for LLM critiques or assistance with organization or presentation, along with warnings that LLMs can’t actually predict your ideas or reason for posting—you have to prompt it well and concretely.
We need both of these, and I’m not sure the balance changes all that much in an LLM world.
Damn, I hate to think about this, but it is unavoidable anyway, isn’t it?
From my perspective, LLM generated texts vs human generated texts are kinda like spam e-mails vs letters written on paper. It was possible to get a stupid and annoying letter on paper, but it usually happened rarely, because there was a time cost to writing them, and writing a letter takes more time than reading it. With spam e-mails, it is the other way round, sending the message takes a few minutes, and the collective time spent reading it (even if most people immediately decide to delete it) could be hours or more.
Emotionally, I would hate to spend my time reading an article, and then decide that I either don’t like it or feel ambivalent about it, and maybe write a comment that tries to be helpful and explain why… and then find out that it took me 15 minutes to read the article, think about it, and write a response, but it only took the author 1 minute to generate it and post it.
But what other options are there? Should I stop reading insufficiently interesting articles after the first half page, and just downvote them and close without commenting? Yeah, that would be better for me, but it would make LW a worse experience for the new authors.
If we could trust that people honestly disclose in a standardized way at the beginning of the article (or even better, click a checkbox when posting it, and an icon would appear next to the title) that the article was written by an LLM, then I would be harsh to the LLM posts and lenient to the new authors. But this creates an obvious incentive to lie about using LLMs. (Possible solution: great punishment for authors who use an LLM and fail to disclose it. But also: false positives.)
I like habryka’s proposed solution, to integrate the LLMs with LW and publish the prompts. Preferably at the top of the page, so that I can read the prompt first and decide whether I want to read the article at all. For example, if the prompt already specifies the bottom line, I probably won’t pay much attention to the arguments LLM makes, because I know it is not even trying to paint an objective picture.
I can imagine legitimate uses of LLM, for example if someone describes something in abstract, and then uses an LLM to suggest specific examples (and then carefully reviews those examples). Articles with examples are usually better than abstract arguments, and sometimes the examples just don’t come to my mind, or they are all similar to each other and the LLM might notice something different.
Maybe the parts written by LLM should be displayed using a different font (e.g. monospace)?
I feel like more and more LLM use is inevitable, and at least some of it is no different from the age-old (as far as “old” applies to LessWrong and online forums) problem of filtering new users who are more enthusiastic than organized, and generate a lot of spew before really slowing down and writing FOR THIS AUDIENCE (or getting mad and going away, in the case of bad fit). LLMs make it easy to increase that volume of low-value stuff.
I really want to enable the high-value uses of LLM, because I want more from many posts and I think LLMs CAN BE a good writing partner. My mental model is that binary approaches (“identify LLM-generated content”) are going to fail, because the incentives are wrong, but also because it discourages the good uses.
The problem of voluminous bad posts has two main tactics for us to use against it.
1) identification and filtering (downvoting and admin intervention). This works today (though it’s time-consuming and uncomfortable), and is likely to continue to work for quite some time. I haven’t really played with LLM as evaluation/summary for things I read and comment on, but I think I’m going to try it. I wonder if I can get GPT to estimate how much effort went into a post...
2) assistance and encouragement of posters to think and write for this audience, and to engage with comments (which are sometimes very direct). This COULD include recommendations for LLM critiques or assistance with organization or presentation, along with warnings that LLMs can’t actually predict your ideas or reason for posting—you have to prompt it well and concretely.
We need both of these, and I’m not sure the balance changes all that much in an LLM world.