To doublecheck/clarify: do you feel strongly (or, weakly) that you don’t want autogenerated jargon to exist on your posts for people who click the “opt into non-author-endorsed AI content” for that post? Or simply that you don’t personally want to be running into it?
Both. I do not want to have AI content added to my post without my knowledge or consent.
In fact, thinking further about it, I do not want AI content added to anyone’s post without their knowledge or consent, anywhere, not just on LessWrong.
Such content could be seen as just automating what people can do anyway with an LLM open in another window. I’ve no business trying to stop people doing that. However, someone doing that knows what they are doing. If the stuff pops up automatically amidst the author’s original words, will they be so aware of its source and grok that the author had nothing to do with it? I do not think that the proposed discreet “AI-generated” label is enough to make it clear that such content is third-party commentary, for which the author carries no responsibility.
But then, who does carry that responsibility? No-one. An AI’s words are news from nowhere. No-one’s reputation is put on the line by uttering them. For it is written, the fundamental question of rationality is “What do I think I know and how do I think I know it?” But these AI popovers cannot be questioned.
And also, I do not personally want to be running into any writing that AI had a hand in.
And also, I do not personally want to be running into any writing that AI had a hand in.
(My guess is the majority of posts written daily on LW are now written with some AI involvement. My best guess is most authors on LessWrong use AI models on a daily level, asking factual questions, and probably also asking for some amount of editing and writing feedback. As such, I don’t think this is a coherent ask.)
If this is true, then it’s a damning indictment of Less Wrong and the authors who post here, and is an excellent reason not to read anything written here.
Here are all of my interactions with claude related to writing blog posts or comments in the last four days:
I asked Claude for a couple back-of-the-envelope power output estimations (running, and scratching one’s nose). I double-checked the results for myself before alluding to them in the (upcoming) post. Claude’s suggestions were generally in the right ballpark, but more importantly Claude helpfully reminded me that metabolic power consumption = mechanical power + heat production, and that I should be clear on which one I mean.
“There are two unrelated senses of “energy conservation”, one being physics, the other being “I want to conserve my energy for later”. Is there some different term I can use for the latter?” — Claude had a couple good suggestions; I think I wound up going with “energy preservation”.
“how many centimeters separate the preoptic nucleus of the hypothalamus from the arcuate nucleus?” — Claude didn’t really know but its ballpark number was consistent with what I would have guessed. I think I also googled, and then just to be safe I worded the claim in a pretty vague way. It didn’t really matter much for my larger point in even that one sentence, let alone for the important points in the whole (upcoming) post.
“what’s a typical amount that a 4yo can pick up? what about a national champion weightlifter? I’m interested in the ratio.” — Claude gave an answer and showed its work. Seemed plausible. I was writing this comment, and after reading Claude’s guess I changed a number from “500” to “50”.
“Are there characteristic auditory properties that distinguish the sound of someone talking to me while facing me, versus talking to me while facing a different direction?” — Claude said some things that were marginally helpful. I didn’t wind up saying anything about that in the (upcoming) post.
“what does “receiving eye contact” mean?” — I was trying to figure out if readers would understand what I mean if I wrote that in my (upcoming) post. I thought it was a standard term but had a niggling worry that I had made it up. Claude got the right answer, so I felt marginally more comfortable using that phrase without defining it.
“what’s the name for the psychotic delusion where you’re surprised by motor actions?” — I had a particular thing in mind, but was blanking on the exact word. Claude was pretty confused but after a couple tries it mentioned “delusion of control”, which is what I wanted. (I googled that term afterwards.)
Somewhat following this up: I think not using LLMs is going to be fairly similar to “not using google.” Google results are not automatically true – you have to use your judgment. But, like, it’s kinda silly to not use it as part of your search process.
I do recommend perplexity.ai for people who want an easier time checking up on where the AI got some info (it does a search first and provides citations, while packaging the results in a clearer overall explanation than google)
I in fact don’t use Google very much these days, and don’t particularly recommend that anyone else do so, either.
(If by “google” you meant “search engines in general”, then that’s a bit different, of course. But then, the analogy here would be to something like “carefully select which LLM products you use, try to minimize their use, avoid the popular ones, and otherwise take all possible steps to ensure that LLMs affect what you see and do as little as possible”.)
Do you not use LLMs daily? I don’t currently find them out-of-the-box useful for editing, but find them useful for a huge variety of tasks related to writing things.
I think it would be more of an indictment of LessWrong if people somehow didn’t use them, they obviously increase my productivity at a wide variety of tasks, and being an early-adopter of powerful AI technologies seems like one of the things that I hope LessWrong authors excell at.
In general, I think Gwern’s suggested LLM policy seems roughly right to me. Of course people should use LLMs extensively in their writing, but if they do, they really have to read any LLM writing that makes it into their post and check what it says is true:
I am also fine with use of AI in general to make us better writers and thinkers, and I am still excited about this. (We unfortunately have not seen much benefit for the highest-quality creative nonfiction/fiction or research, like we aspire to on LW2, but this is in considerable part due to technical choices & historical contingency, which I’ve discussed many times before, and I still believe in the fundamental possibilities there.) We definitely shouldn’t be trying to ban AI use per se.
However, if someone is posting a GPT-4 (or Claude or Llama) sample which is just a response, then they had damn well better have checked it and made sure that the references existed and said what the sample says they said and that the sample makes sense and they fixed any issues in it. If they wrote something and had the LLM edit it, then they should have checked those edits and made sure the edits are in fact improvements, and improved the improvements, instead of letting their essay degrade into ChatGPTese. And so on.
Seems like a mistake! Agree it’s not uncommon to use them less, though my guess (with like 60% confidence) is that the majority of authors on LW use them daily, or very close to daily.
Prolly less than 60%. I think you’re overestimating how LLM-pilled the overall LW userbase is (even filtering for people who publish posts). But, my guess is like 25-45% tho.
I would strongly bet against majority using AI tools ~daily (off the top of my head: <40% with 80% confidence?): adoption of any new tool is just much slower than people would predict, plus the LW team is liable to vastly overpredict this since you’re from California.
That said, there are some difficulties with how to operationalize this question, e.g. I know some particularly prolific LW posters (like Zvi) use AI.
I also use them rarely, fwiw. Maybe I’m missing some more productive use, but I’ve experimented a decent amount and have yet to find a way to make regular use even neutral (much less helpful) for my thinking or writing.
I just added “LLM Frequency” and “LLM Use case” to the survey, under LessWrong Team Questions. I’ll probably tweak the options and might move it to Bonus Questions later. Suggestions welcome!
First of all, even taking what Gwern says there at face value, how many of the posts here that are written “with AI involvement” would you say actually are checked, edited, etc., in the rigorous way which Gwern describes? Realistically?
Secondly, when Gwern says that he is “fine with use of AI in general to make us better writers and thinkers” and that he is “still excited about this”, you should understand that he is talking about stuff like this and this, and not about stuff like “instead of thinking about things, refining my ideas, and writing them down, I just asked a LLM to write a post for me”.
Approximately zero percent of the people who read Gwern’s comment will think of the former sort of idea (it takes a Gwern to think of such things, and those are in very limited supply), rather than the latter.
The policy of “encourage the use of AI for writing posts/comments here, and provide tools to easily generate more AI-written crap” doesn’t lead to more of the sort of thing that Gwern describes at the above links. It leads to a deluge of un-checked crap.
First of all, even taking what Gwern says there at face value, how many of the posts here that are written “with AI involvement” would you say actually are checked, edited, etc., in the rigorous way which Gwern describes? Realistically?
My guess is very few people are using AI output directly (at least the present it’s pretty obvious as their writing is kind of atrocious). I do think most posts probably involved people talking to an LLM through their thoughts, or ask for some editing help, or ask some factual questions. My guess is basically 100% of those went through the kind of process that Gwern was describing here.
I currently wish I had a policy for knowing with confidence whether a user wrote part of their post with a language model. There’s a (small) regular stream of new-user content that I look through, where I’m above 50% that AI wrote some of it (very formulaic, unoriginal writing, imitating academic style) but I am worried about being rude when saying “I rejected your first post because I reckon you didn’t write this and it doesn’t reflect your thoughts” if I end up being wrong like 1 in 3 times[1].
Sometimes I use various online language-model checkers (1, 2, 3), but I don’t know how accurate/reliable they are. If they are actually pretty good, I may well automatically run them on all submitted posts to LW so I can be more confident.
Also one time I pushed back on this and the user explained they’re not a native English speaker, so tried to use a model to improve their English, which I thought was more reasonable than many uses.
I’d be pretty into having typography styling settings that auto-detect LM stuff (or, specifically track when users have used any LW-specific LM tools), and flag it with some kind of style difference so it’s easy to track at a glance (esp if it could be pretty reliable).
Lots of people are pushing back on this, but I do want to say explicitly that I agree that raw LLM-produced text is mostly not up to LW standards, and that the writing style that current-gen LLMs produce by default sucks. In the new-user-posting-for-the-first-time moderation queue, next to the SEO spam, we do see some essays that look like raw LLM output, and we reject these.
That doesn’t mean LLMs don’t have good use around the edges. In the case of defining commonly-used jargon, there is no need for insight or originality, the task is search-engine-adjacent, and so I think LLMs have a role there. That said, if the glossary content is coming out bad in practice, that’s important feedback.
But then, who does carry that responsibility? No-one.
For thie case of this particular feature and ones like it: The LessWrong team. And, in this case, more specifically, me.
I welcome being held accountable for this going wrong in various ways. (I plan to engage more with people who present specific cruxes rather than a generalized “it seems scary”, but, this seems very important for a human to be in the loop about, who actually takes responsibility for both locally being good, and longterm consequences)
You and the LW team are indirectly responsible, but only for the general feature. You are not standing behind each individual statement the AI makes. If the author of the post does not vet it, no-one stands behind it. The LW admins can be involved only in hindsight, if the AI does something particularly egregious.
This feels like you have some way of thinking about responsibility that I’m not sure I’m tracking all the pieces of.
Who literally meant the individuals? No one (or, some random alien mind).
Who should take actions if someone flags that an unapproved term is wrong? The author, if they want to be involved, and site-admins (or me-in-particular), if they author does not want to be involved.
Who should be complained to if this overall system is having bad consequences? Site admins, me-in-particular or habryka-in-particular (Habryka has more final authority, I have more context on this feature. You can start with me and then escalate, or tag both of us, or whatever)
Who should have Some Kind of Social Pressure Leveraged At them if reasonable complaints seem to be falling on deaf ears and there are multiple people worried? Also the site admins, and habryka-and-me-in-particular.
It seems like you want #1 to have a better answer, but I don’t really know why.
Rather, I am pointing out that #1 is the case. No-one means the words that an AI produces. This is the fundamental reason for my distaste for AI-generated text. Its current low quality is a substantial but secondary issue.
If there is something flagrantly wrong with it, then 2, 3, and 4 come into play, but that won’t happen with standard average AI slop, unless it were eventually judged to be so persistently low quality that a decision were made to discontinue all ungated AI commentary.
The most important thing is “There is a small number of individuals who are paying attention, who you can argue with, and if you don’t like what they’re doing, I encourage you to write blogposts or comments complaining about it. And if your arguments make sense to me/us, we might change our mind. If they don’t make sense, but there seems to be some consensus that the arguments are true, we might lose the Mandate of Heaven or something.”
I will personally be using my best judgment to guide my decisionmaking. Habryka is the one actually making final calls about what gets shipped to the site, insofar as I update that we’re doing a wrong thing, I’ll argue about it.”
It happening at all already constitutes “going wrong”.
This particular sort of comment doesn’t particularly move me. I’m more likely to be moved by “I predict that if AI used in such and such a way it’ll have such and such effects, and those effects are bad.” Which I won’t necessarily automatically believe, but, I might update on if it’s argued well or seems intuitively obvious once it’s pointed out.
I’ll be generally tracking a lot of potential negative effects and if it seems like it’s turning out “the effects were more likely” or “the effects were worse than I thought”, I’ll try to update swiftly.
The most important thing is “There is a small number of individuals who are paying attention, who you can argue with, and if you don’t like what they’re doing, I encourage you to write blogposts or comments complaining about it. And if your arguments make sense to me/us, we might change our mind. If they don’t make sense, but there seems to be some consensus that the arguments are true, we might lose the Mandate of Heaven or something.”
There’s not, like, anything necessarily wrong with this, on its own terms, but… this is definitely not what “being held accountable” is.
It happening at all already constitutes “going wrong”.
This particular sort of comment doesn’t particularly move me.
All this really means is that you’ll just do with this whatever you feel like doing. Which, again, is not necessarily “wrong”, and really it’s the default scenario for, like… websites, in general… I just really would like to emphasize that “being held accountable” has approximately nothing to do with anything that you’re describing.
As far as the specifics go… well, the bad effect here is that instead of the site being a way for me to read the ideas and commentary of people whose thoughts and writings I find interesting, it becomes just another purveyor of AI “extruded writing product”. I really don’t know why I’d want more of that than there already is, all over the internet. I mean… it’s a bad thing. Pretty straightforwardly. If you don’t think so then I don’t know what to tell you.
All I can say is that this sort of thing drastically reduces my interest in participating here. But then, my participation level has already been fairly low for a while, so… maybe that doesn’t matter very much, either. On the other hand, I don’t think that I’m the only one who has this opinion of LLM outputs.
it becomes just another purveyor of AI “extruded writing product”.
If it happened here the way it happened on the rest of the internet, (in terms of what the written content was like) I’d agree it’d be straightforwardly bad.
For things like jargon-hoverovers, the questions IMO are:
is the explanation accurate?
is the explanation helpful for explaining complex posts, esp. with many technical terms?
does the explanation feel like soulless slop that makes you feel ughy the way a lot of the internet is making you feel ughy these days?
If the answer to the first two is “yep”, and the third one is “alas, also yep”, then I think an ideal state is for the terms to be hidden-by-default but easily accessible for people who are trying to learn effectively, and are willing to put up with somewhat AI-slop-sounding but clear/accurate explanations.
If the answer to the first two is “yep”, and the third one is “no, actually is just reads pretty well (maybe even in the author’s own style, if they want that)”, then IMO there’s not really a problem.
I am interested in your actual honest opinion of, say, the glossary I just generated for Unifying Bargaining Notions (1/2) (you’ll have to click option-shift-G to enable the glossary on lesswrong.com). That seems like a post where you will probably know most of the terms to judge them on accuracy, while it still being technical enough you can imagine being a person unfamiliar with game theory trying to understand the post, and having a sense of both how useful they’d be and how aesthetically they feel.
My personal take is that they aren’t quite as clear as I’d like and not quite as alive-feeling as I’d like, but over the threshold of both that I much rather having them than not having them, esp. if I knew less game theory than I currently do.
Part of the uncertainties we’re aiming to reduce here are “can we make thinking tools or writing tools that are actually good, instead of bad?” and our experiments so far suggest “maybe”. We’re also designing with “six months from now” in mind – the current level of capabilities and quality won’t be static.
Our theory of “secret sauce” is “most of the corporate Tech World in fact has bad taste in writing, and the LLM fine-tunings and RLHF data is generated by people with bad taste. Getting good output requires both good taste and prompting skill, and you’re mostly just not seeing people try.”
We’ve experimented with jailbroken Base Claude which does a decent job of actually having different styles. It’s harder to get to work reliably, but, not so much harder that it feels intractable.
The JargonHovers currently use regular Claude, not jailbroken claude. I have guesses of how to eventually get them to write it in something like the author’s original style, although it’s a harder problem so we haven’t tried that hard yet.
I would like to be able to set my defaults so that I never see any of the proposed AI content. Will this be possible?
To doublecheck/clarify: do you feel strongly (or, weakly) that you don’t want autogenerated jargon to exist on your posts for people who click the “opt into non-author-endorsed AI content” for that post? Or simply that you don’t personally want to be running into it?
(Oh, hey, you’re the one who wrote Please do not use AI to write for you)
Both. I do not want to have AI content added to my post without my knowledge or consent.
In fact, thinking further about it, I do not want AI content added to anyone’s post without their knowledge or consent, anywhere, not just on LessWrong.
Such content could be seen as just automating what people can do anyway with an LLM open in another window. I’ve no business trying to stop people doing that. However, someone doing that knows what they are doing. If the stuff pops up automatically amidst the author’s original words, will they be so aware of its source and grok that the author had nothing to do with it? I do not think that the proposed discreet “AI-generated” label is enough to make it clear that such content is third-party commentary, for which the author carries no responsibility.
But then, who does carry that responsibility? No-one. An AI’s words are news from nowhere. No-one’s reputation is put on the line by uttering them. For it is written, the fundamental question of rationality is “What do I think I know and how do I think I know it?” But these AI popovers cannot be questioned.
And also, I do not personally want to be running into any writing that AI had a hand in.
I am that person, and continue to be.
(My guess is the majority of posts written daily on LW are now written with some AI involvement. My best guess is most authors on LessWrong use AI models on a daily level, asking factual questions, and probably also asking for some amount of editing and writing feedback. As such, I don’t think this is a coherent ask.)
If this is true, then it’s a damning indictment of Less Wrong and the authors who post here, and is an excellent reason not to read anything written here.
Here are all of my interactions with claude related to writing blog posts or comments in the last four days:
I asked Claude for a couple back-of-the-envelope power output estimations (running, and scratching one’s nose). I double-checked the results for myself before alluding to them in the (upcoming) post. Claude’s suggestions were generally in the right ballpark, but more importantly Claude helpfully reminded me that metabolic power consumption = mechanical power + heat production, and that I should be clear on which one I mean.
“There are two unrelated senses of “energy conservation”, one being physics, the other being “I want to conserve my energy for later”. Is there some different term I can use for the latter?” — Claude had a couple good suggestions; I think I wound up going with “energy preservation”.
“how many centimeters separate the preoptic nucleus of the hypothalamus from the arcuate nucleus?” — Claude didn’t really know but its ballpark number was consistent with what I would have guessed. I think I also googled, and then just to be safe I worded the claim in a pretty vague way. It didn’t really matter much for my larger point in even that one sentence, let alone for the important points in the whole (upcoming) post.
“what’s a typical amount that a 4yo can pick up? what about a national champion weightlifter? I’m interested in the ratio.” — Claude gave an answer and showed its work. Seemed plausible. I was writing this comment, and after reading Claude’s guess I changed a number from “500” to “50”.
“Are there characteristic auditory properties that distinguish the sound of someone talking to me while facing me, versus talking to me while facing a different direction?” — Claude said some things that were marginally helpful. I didn’t wind up saying anything about that in the (upcoming) post.
“what does “receiving eye contact” mean?” — I was trying to figure out if readers would understand what I mean if I wrote that in my (upcoming) post. I thought it was a standard term but had a niggling worry that I had made it up. Claude got the right answer, so I felt marginally more comfortable using that phrase without defining it.
“what’s the name for the psychotic delusion where you’re surprised by motor actions?” — I had a particular thing in mind, but was blanking on the exact word. Claude was pretty confused but after a couple tries it mentioned “delusion of control”, which is what I wanted. (I googled that term afterwards.)
Somewhat following this up: I think not using LLMs is going to be fairly similar to “not using google.” Google results are not automatically true – you have to use your judgment. But, like, it’s kinda silly to not use it as part of your search process.
I do recommend perplexity.ai for people who want an easier time checking up on where the AI got some info (it does a search first and provides citations, while packaging the results in a clearer overall explanation than google)
I in fact don’t use Google very much these days, and don’t particularly recommend that anyone else do so, either.
(If by “google” you meant “search engines in general”, then that’s a bit different, of course. But then, the analogy here would be to something like “carefully select which LLM products you use, try to minimize their use, avoid the popular ones, and otherwise take all possible steps to ensure that LLMs affect what you see and do as little as possible”.)
Do you not use LLMs daily? I don’t currently find them out-of-the-box useful for editing, but find them useful for a huge variety of tasks related to writing things.
I think it would be more of an indictment of LessWrong if people somehow didn’t use them, they obviously increase my productivity at a wide variety of tasks, and being an early-adopter of powerful AI technologies seems like one of the things that I hope LessWrong authors excell at.
In general, I think Gwern’s suggested LLM policy seems roughly right to me. Of course people should use LLMs extensively in their writing, but if they do, they really have to read any LLM writing that makes it into their post and check what it says is true:
FWIW I think it’s not uncommon for people to not use LLMs daily (e.g. I don’t).
Seems like a mistake! Agree it’s not uncommon to use them less, though my guess (with like 60% confidence) is that the majority of authors on LW use them daily, or very close to daily.
Consider the reaction my comment from three months ago got.
Prolly less than 60%. I think you’re overestimating how LLM-pilled the overall LW userbase is (even filtering for people who publish posts). But, my guess is like 25-45% tho.
I would strongly bet against majority using AI tools ~daily (off the top of my head: <40% with 80% confidence?): adoption of any new tool is just much slower than people would predict, plus the LW team is liable to vastly overpredict this since you’re from California.
That said, there are some difficulties with how to operationalize this question, e.g. I know some particularly prolific LW posters (like Zvi) use AI.
I also use them rarely, fwiw. Maybe I’m missing some more productive use, but I’ve experimented a decent amount and have yet to find a way to make regular use even neutral (much less helpful) for my thinking or writing.
I enjoyed reading Nicholas Carlini and Jeff Kaufman write about how they use them, if you’re looking for inspiration.
Thanks; it makes sense that use cases like these would benefit, I just rarely have similar ones when thinking or writing.
I recommend having this question in the next lesswrong survey.
Along the lines of “How often do you use LLMs and your usecase?”
Great idea!
@Screwtape?
On it!
I just added “LLM Frequency” and “LLM Use case” to the survey, under LessWrong Team Questions. I’ll probably tweak the options and might move it to Bonus Questions later. Suggestions welcome!
Not even once.
First of all, even taking what Gwern says there at face value, how many of the posts here that are written “with AI involvement” would you say actually are checked, edited, etc., in the rigorous way which Gwern describes? Realistically?
Secondly, when Gwern says that he is “fine with use of AI in general to make us better writers and thinkers” and that he is “still excited about this”, you should understand that he is talking about stuff like this and this, and not about stuff like “instead of thinking about things, refining my ideas, and writing them down, I just asked a LLM to write a post for me”.
Approximately zero percent of the people who read Gwern’s comment will think of the former sort of idea (it takes a Gwern to think of such things, and those are in very limited supply), rather than the latter.
The policy of “encourage the use of AI for writing posts/comments here, and provide tools to easily generate more AI-written crap” doesn’t lead to more of the sort of thing that Gwern describes at the above links. It leads to a deluge of un-checked crap.
My guess is very few people are using AI output directly (at least the present it’s pretty obvious as their writing is kind of atrocious). I do think most posts probably involved people talking to an LLM through their thoughts, or ask for some editing help, or ask some factual questions. My guess is basically 100% of those went through the kind of process that Gwern was describing here.
I currently wish I had a policy for knowing with confidence whether a user wrote part of their post with a language model. There’s a (small) regular stream of new-user content that I look through, where I’m above 50% that AI wrote some of it (very formulaic, unoriginal writing, imitating academic style) but I am worried about being rude when saying “I rejected your first post because I reckon you didn’t write this and it doesn’t reflect your thoughts” if I end up being wrong like 1 in 3 times[1].
Sometimes I use various online language-model checkers (1, 2, 3), but I don’t know how accurate/reliable they are. If they are actually pretty good, I may well automatically run them on all submitted posts to LW so I can be more confident.
Also one time I pushed back on this and the user explained they’re not a native English speaker, so tried to use a model to improve their English, which I thought was more reasonable than many uses.
I’d be pretty into having typography styling settings that auto-detect LM stuff (or, specifically track when users have used any LW-specific LM tools), and flag it with some kind of style difference so it’s easy to track at a glance (esp if it could be pretty reliable).
Lots of people are pushing back on this, but I do want to say explicitly that I agree that raw LLM-produced text is mostly not up to LW standards, and that the writing style that current-gen LLMs produce by default sucks. In the new-user-posting-for-the-first-time moderation queue, next to the SEO spam, we do see some essays that look like raw LLM output, and we reject these.
That doesn’t mean LLMs don’t have good use around the edges. In the case of defining commonly-used jargon, there is no need for insight or originality, the task is search-engine-adjacent, and so I think LLMs have a role there. That said, if the glossary content is coming out bad in practice, that’s important feedback.
For thie case of this particular feature and ones like it: The LessWrong team. And, in this case, more specifically, me.
I welcome being held accountable for this going wrong in various ways. (I plan to engage more with people who present specific cruxes rather than a generalized “it seems scary”, but, this seems very important for a human to be in the loop about, who actually takes responsibility for both locally being good, and longterm consequences)
FWIW I think the actual person with responsibility is the author if the author approves it, and you if the author doesn’t.
You and the LW team are indirectly responsible, but only for the general feature. You are not standing behind each individual statement the AI makes. If the author of the post does not vet it, no-one stands behind it. The LW admins can be involved only in hindsight, if the AI does something particularly egregious.
This feels like you have some way of thinking about responsibility that I’m not sure I’m tracking all the pieces of.
Who literally meant the individuals? No one (or, some random alien mind).
Who should take actions if someone flags that an unapproved term is wrong? The author, if they want to be involved, and site-admins (or me-in-particular), if they author does not want to be involved.
Who should be complained to if this overall system is having bad consequences? Site admins, me-in-particular or habryka-in-particular (Habryka has more final authority, I have more context on this feature. You can start with me and then escalate, or tag both of us, or whatever)
Who should have Some Kind of Social Pressure Leveraged At them if reasonable complaints seem to be falling on deaf ears and there are multiple people worried? Also the site admins, and habryka-and-me-in-particular.
It seems like you want #1 to have a better answer, but I don’t really know why.
Rather, I am pointing out that #1 is the case. No-one means the words that an AI produces. This is the fundamental reason for my distaste for AI-generated text. Its current low quality is a substantial but secondary issue.
If there is something flagrantly wrong with it, then 2, 3, and 4 come into play, but that won’t happen with standard average AI slop, unless it were eventually judged to be so persistently low quality that a decision were made to discontinue all ungated AI commentary.
It happening at all already constitutes “going wrong”.
Also: by what means can you be “held accountable”?
The most important thing is “There is a small number of individuals who are paying attention, who you can argue with, and if you don’t like what they’re doing, I encourage you to write blogposts or comments complaining about it. And if your arguments make sense to me/us, we might change our mind. If they don’t make sense, but there seems to be some consensus that the arguments are true, we might lose the Mandate of Heaven or something.”
I will personally be using my best judgment to guide my decisionmaking. Habryka is the one actually making final calls about what gets shipped to the site, insofar as I update that we’re doing a wrong thing, I’ll argue about it.”
This particular sort of comment doesn’t particularly move me. I’m more likely to be moved by “I predict that if AI used in such and such a way it’ll have such and such effects, and those effects are bad.” Which I won’t necessarily automatically believe, but, I might update on if it’s argued well or seems intuitively obvious once it’s pointed out.
I’ll be generally tracking a lot of potential negative effects and if it seems like it’s turning out “the effects were more likely” or “the effects were worse than I thought”, I’ll try to update swiftly.
There’s not, like, anything necessarily wrong with this, on its own terms, but… this is definitely not what “being held accountable” is.
All this really means is that you’ll just do with this whatever you feel like doing. Which, again, is not necessarily “wrong”, and really it’s the default scenario for, like… websites, in general… I just really would like to emphasize that “being held accountable” has approximately nothing to do with anything that you’re describing.
As far as the specifics go… well, the bad effect here is that instead of the site being a way for me to read the ideas and commentary of people whose thoughts and writings I find interesting, it becomes just another purveyor of AI “extruded writing product”. I really don’t know why I’d want more of that than there already is, all over the internet. I mean… it’s a bad thing. Pretty straightforwardly. If you don’t think so then I don’t know what to tell you.
All I can say is that this sort of thing drastically reduces my interest in participating here. But then, my participation level has already been fairly low for a while, so… maybe that doesn’t matter very much, either. On the other hand, I don’t think that I’m the only one who has this opinion of LLM outputs.
If it happened here the way it happened on the rest of the internet, (in terms of what the written content was like) I’d agree it’d be straightforwardly bad.
For things like jargon-hoverovers, the questions IMO are:
is the explanation accurate?
is the explanation helpful for explaining complex posts, esp. with many technical terms?
does the explanation feel like soulless slop that makes you feel ughy the way a lot of the internet is making you feel ughy these days?
If the answer to the first two is “yep”, and the third one is “alas, also yep”, then I think an ideal state is for the terms to be hidden-by-default but easily accessible for people who are trying to learn effectively, and are willing to put up with somewhat AI-slop-sounding but clear/accurate explanations.
If the answer to the first two is “yep”, and the third one is “no, actually is just reads pretty well (maybe even in the author’s own style, if they want that)”, then IMO there’s not really a problem.
I am interested in your actual honest opinion of, say, the glossary I just generated for Unifying Bargaining Notions (1/2) (you’ll have to click option-shift-G to enable the glossary on lesswrong.com). That seems like a post where you will probably know most of the terms to judge them on accuracy, while it still being technical enough you can imagine being a person unfamiliar with game theory trying to understand the post, and having a sense of both how useful they’d be and how aesthetically they feel.
My personal take is that they aren’t quite as clear as I’d like and not quite as alive-feeling as I’d like, but over the threshold of both that I much rather having them than not having them, esp. if I knew less game theory than I currently do.
Part of the uncertainties we’re aiming to reduce here are “can we make thinking tools or writing tools that are actually good, instead of bad?” and our experiments so far suggest “maybe”. We’re also designing with “six months from now” in mind – the current level of capabilities and quality won’t be static.
Our theory of “secret sauce” is “most of the corporate Tech World in fact has bad taste in writing, and the LLM fine-tunings and RLHF data is generated by people with bad taste. Getting good output requires both good taste and prompting skill, and you’re mostly just not seeing people try.”
We’ve experimented with jailbroken Base Claude which does a decent job of actually having different styles. It’s harder to get to work reliably, but, not so much harder that it feels intractable.
The JargonHovers currently use regular Claude, not jailbroken claude. I have guesses of how to eventually get them to write it in something like the author’s original style, although it’s a harder problem so we haven’t tried that hard yet.
Yeah, seems good for us to build that today.