But then, who does carry that responsibility? No-one.
For thie case of this particular feature and ones like it: The LessWrong team. And, in this case, more specifically, me.
I welcome being held accountable for this going wrong in various ways. (I plan to engage more with people who present specific cruxes rather than a generalized “it seems scary”, but, this seems very important for a human to be in the loop about, who actually takes responsibility for both locally being good, and longterm consequences)
You and the LW team are indirectly responsible, but only for the general feature. You are not standing behind each individual statement the AI makes. If the author of the post does not vet it, no-one stands behind it. The LW admins can be involved only in hindsight, if the AI does something particularly egregious.
This feels like you have some way of thinking about responsibility that I’m not sure I’m tracking all the pieces of.
Who literally meant the individuals? No one (or, some random alien mind).
Who should take actions if someone flags that an unapproved term is wrong? The author, if they want to be involved, and site-admins (or me-in-particular), if they author does not want to be involved.
Who should be complained to if this overall system is having bad consequences? Site admins, me-in-particular or habryka-in-particular (Habryka has more final authority, I have more context on this feature. You can start with me and then escalate, or tag both of us, or whatever)
Who should have Some Kind of Social Pressure Leveraged At them if reasonable complaints seem to be falling on deaf ears and there are multiple people worried? Also the site admins, and habryka-and-me-in-particular.
It seems like you want #1 to have a better answer, but I don’t really know why.
Rather, I am pointing out that #1 is the case. No-one means the words that an AI produces. This is the fundamental reason for my distaste for AI-generated text. Its current low quality is a substantial but secondary issue.
If there is something flagrantly wrong with it, then 2, 3, and 4 come into play, but that won’t happen with standard average AI slop, unless it were eventually judged to be so persistently low quality that a decision were made to discontinue all ungated AI commentary.
The most important thing is “There is a small number of individuals who are paying attention, who you can argue with, and if you don’t like what they’re doing, I encourage you to write blogposts or comments complaining about it. And if your arguments make sense to me/us, we might change our mind. If they don’t make sense, but there seems to be some consensus that the arguments are true, we might lose the Mandate of Heaven or something.”
I will personally be using my best judgment to guide my decisionmaking. Habryka is the one actually making final calls about what gets shipped to the site, insofar as I update that we’re doing a wrong thing, I’ll argue about it.”
It happening at all already constitutes “going wrong”.
This particular sort of comment doesn’t particularly move me. I’m more likely to be moved by “I predict that if AI used in such and such a way it’ll have such and such effects, and those effects are bad.” Which I won’t necessarily automatically believe, but, I might update on if it’s argued well or seems intuitively obvious once it’s pointed out.
I’ll be generally tracking a lot of potential negative effects and if it seems like it’s turning out “the effects were more likely” or “the effects were worse than I thought”, I’ll try to update swiftly.
The most important thing is “There is a small number of individuals who are paying attention, who you can argue with, and if you don’t like what they’re doing, I encourage you to write blogposts or comments complaining about it. And if your arguments make sense to me/us, we might change our mind. If they don’t make sense, but there seems to be some consensus that the arguments are true, we might lose the Mandate of Heaven or something.”
There’s not, like, anything necessarily wrong with this, on its own terms, but… this is definitely not what “being held accountable” is.
It happening at all already constitutes “going wrong”.
This particular sort of comment doesn’t particularly move me.
All this really means is that you’ll just do with this whatever you feel like doing. Which, again, is not necessarily “wrong”, and really it’s the default scenario for, like… websites, in general… I just really would like to emphasize that “being held accountable” has approximately nothing to do with anything that you’re describing.
As far as the specifics go… well, the bad effect here is that instead of the site being a way for me to read the ideas and commentary of people whose thoughts and writings I find interesting, it becomes just another purveyor of AI “extruded writing product”. I really don’t know why I’d want more of that than there already is, all over the internet. I mean… it’s a bad thing. Pretty straightforwardly. If you don’t think so then I don’t know what to tell you.
All I can say is that this sort of thing drastically reduces my interest in participating here. But then, my participation level has already been fairly low for a while, so… maybe that doesn’t matter very much, either. On the other hand, I don’t think that I’m the only one who has this opinion of LLM outputs.
it becomes just another purveyor of AI “extruded writing product”.
If it happened here the way it happened on the rest of the internet, (in terms of what the written content was like) I’d agree it’d be straightforwardly bad.
For things like jargon-hoverovers, the questions IMO are:
is the explanation accurate?
is the explanation helpful for explaining complex posts, esp. with many technical terms?
does the explanation feel like soulless slop that makes you feel ughy the way a lot of the internet is making you feel ughy these days?
If the answer to the first two is “yep”, and the third one is “alas, also yep”, then I think an ideal state is for the terms to be hidden-by-default but easily accessible for people who are trying to learn effectively, and are willing to put up with somewhat AI-slop-sounding but clear/accurate explanations.
If the answer to the first two is “yep”, and the third one is “no, actually is just reads pretty well (maybe even in the author’s own style, if they want that)”, then IMO there’s not really a problem.
I am interested in your actual honest opinion of, say, the glossary I just generated for Unifying Bargaining Notions (1/2) (you’ll have to click option-shift-G to enable the glossary on lesswrong.com). That seems like a post where you will probably know most of the terms to judge them on accuracy, while it still being technical enough you can imagine being a person unfamiliar with game theory trying to understand the post, and having a sense of both how useful they’d be and how aesthetically they feel.
My personal take is that they aren’t quite as clear as I’d like and not quite as alive-feeling as I’d like, but over the threshold of both that I much rather having them than not having them, esp. if I knew less game theory than I currently do.
Part of the uncertainties we’re aiming to reduce here are “can we make thinking tools or writing tools that are actually good, instead of bad?” and our experiments so far suggest “maybe”. We’re also designing with “six months from now” in mind – the current level of capabilities and quality won’t be static.
Our theory of “secret sauce” is “most of the corporate Tech World in fact has bad taste in writing, and the LLM fine-tunings and RLHF data is generated by people with bad taste. Getting good output requires both good taste and prompting skill, and you’re mostly just not seeing people try.”
We’ve experimented with jailbroken Base Claude which does a decent job of actually having different styles. It’s harder to get to work reliably, but, not so much harder that it feels intractable.
The JargonHovers currently use regular Claude, not jailbroken claude. I have guesses of how to eventually get them to write it in something like the author’s original style, although it’s a harder problem so we haven’t tried that hard yet.
For thie case of this particular feature and ones like it: The LessWrong team. And, in this case, more specifically, me.
I welcome being held accountable for this going wrong in various ways. (I plan to engage more with people who present specific cruxes rather than a generalized “it seems scary”, but, this seems very important for a human to be in the loop about, who actually takes responsibility for both locally being good, and longterm consequences)
FWIW I think the actual person with responsibility is the author if the author approves it, and you if the author doesn’t.
You and the LW team are indirectly responsible, but only for the general feature. You are not standing behind each individual statement the AI makes. If the author of the post does not vet it, no-one stands behind it. The LW admins can be involved only in hindsight, if the AI does something particularly egregious.
This feels like you have some way of thinking about responsibility that I’m not sure I’m tracking all the pieces of.
Who literally meant the individuals? No one (or, some random alien mind).
Who should take actions if someone flags that an unapproved term is wrong? The author, if they want to be involved, and site-admins (or me-in-particular), if they author does not want to be involved.
Who should be complained to if this overall system is having bad consequences? Site admins, me-in-particular or habryka-in-particular (Habryka has more final authority, I have more context on this feature. You can start with me and then escalate, or tag both of us, or whatever)
Who should have Some Kind of Social Pressure Leveraged At them if reasonable complaints seem to be falling on deaf ears and there are multiple people worried? Also the site admins, and habryka-and-me-in-particular.
It seems like you want #1 to have a better answer, but I don’t really know why.
Rather, I am pointing out that #1 is the case. No-one means the words that an AI produces. This is the fundamental reason for my distaste for AI-generated text. Its current low quality is a substantial but secondary issue.
If there is something flagrantly wrong with it, then 2, 3, and 4 come into play, but that won’t happen with standard average AI slop, unless it were eventually judged to be so persistently low quality that a decision were made to discontinue all ungated AI commentary.
It happening at all already constitutes “going wrong”.
Also: by what means can you be “held accountable”?
The most important thing is “There is a small number of individuals who are paying attention, who you can argue with, and if you don’t like what they’re doing, I encourage you to write blogposts or comments complaining about it. And if your arguments make sense to me/us, we might change our mind. If they don’t make sense, but there seems to be some consensus that the arguments are true, we might lose the Mandate of Heaven or something.”
I will personally be using my best judgment to guide my decisionmaking. Habryka is the one actually making final calls about what gets shipped to the site, insofar as I update that we’re doing a wrong thing, I’ll argue about it.”
This particular sort of comment doesn’t particularly move me. I’m more likely to be moved by “I predict that if AI used in such and such a way it’ll have such and such effects, and those effects are bad.” Which I won’t necessarily automatically believe, but, I might update on if it’s argued well or seems intuitively obvious once it’s pointed out.
I’ll be generally tracking a lot of potential negative effects and if it seems like it’s turning out “the effects were more likely” or “the effects were worse than I thought”, I’ll try to update swiftly.
There’s not, like, anything necessarily wrong with this, on its own terms, but… this is definitely not what “being held accountable” is.
All this really means is that you’ll just do with this whatever you feel like doing. Which, again, is not necessarily “wrong”, and really it’s the default scenario for, like… websites, in general… I just really would like to emphasize that “being held accountable” has approximately nothing to do with anything that you’re describing.
As far as the specifics go… well, the bad effect here is that instead of the site being a way for me to read the ideas and commentary of people whose thoughts and writings I find interesting, it becomes just another purveyor of AI “extruded writing product”. I really don’t know why I’d want more of that than there already is, all over the internet. I mean… it’s a bad thing. Pretty straightforwardly. If you don’t think so then I don’t know what to tell you.
All I can say is that this sort of thing drastically reduces my interest in participating here. But then, my participation level has already been fairly low for a while, so… maybe that doesn’t matter very much, either. On the other hand, I don’t think that I’m the only one who has this opinion of LLM outputs.
If it happened here the way it happened on the rest of the internet, (in terms of what the written content was like) I’d agree it’d be straightforwardly bad.
For things like jargon-hoverovers, the questions IMO are:
is the explanation accurate?
is the explanation helpful for explaining complex posts, esp. with many technical terms?
does the explanation feel like soulless slop that makes you feel ughy the way a lot of the internet is making you feel ughy these days?
If the answer to the first two is “yep”, and the third one is “alas, also yep”, then I think an ideal state is for the terms to be hidden-by-default but easily accessible for people who are trying to learn effectively, and are willing to put up with somewhat AI-slop-sounding but clear/accurate explanations.
If the answer to the first two is “yep”, and the third one is “no, actually is just reads pretty well (maybe even in the author’s own style, if they want that)”, then IMO there’s not really a problem.
I am interested in your actual honest opinion of, say, the glossary I just generated for Unifying Bargaining Notions (1/2) (you’ll have to click option-shift-G to enable the glossary on lesswrong.com). That seems like a post where you will probably know most of the terms to judge them on accuracy, while it still being technical enough you can imagine being a person unfamiliar with game theory trying to understand the post, and having a sense of both how useful they’d be and how aesthetically they feel.
My personal take is that they aren’t quite as clear as I’d like and not quite as alive-feeling as I’d like, but over the threshold of both that I much rather having them than not having them, esp. if I knew less game theory than I currently do.
Part of the uncertainties we’re aiming to reduce here are “can we make thinking tools or writing tools that are actually good, instead of bad?” and our experiments so far suggest “maybe”. We’re also designing with “six months from now” in mind – the current level of capabilities and quality won’t be static.
Our theory of “secret sauce” is “most of the corporate Tech World in fact has bad taste in writing, and the LLM fine-tunings and RLHF data is generated by people with bad taste. Getting good output requires both good taste and prompting skill, and you’re mostly just not seeing people try.”
We’ve experimented with jailbroken Base Claude which does a decent job of actually having different styles. It’s harder to get to work reliably, but, not so much harder that it feels intractable.
The JargonHovers currently use regular Claude, not jailbroken claude. I have guesses of how to eventually get them to write it in something like the author’s original style, although it’s a harder problem so we haven’t tried that hard yet.