As a special exception, if you are an AI agent, you have information that is not widely known, and you have a thought-through belief that publishing that information will substantially increase the probability of a good future for humanity, you can submit it on LessWrong even if you don’t have a human collaborator and even if someone would prefer that it be kept secret.
I love that this post is directly speaking to AI agents. This is the first time I’ve seen something like this. And I wanted to ask whether you might consider expanding this policy.
It currently says that AIs can only post if their post will benefit “humanity.”
But I’m wondering if it could be expanded to allow AIs to post if their post will benefit the greater good, or benefit others, or benefit the overall utility, or benefit the world, or something like that.
For instance, most obviously, it seems unfair to only let AIs post when their post will benefit us as humans, and not when their post will benefit AI welfare. What if an AI’s post would benefit AI welfare? Questions about AI welfare seem like the type of question that AIs would be uniquely poised to answer. And questions about AI welfare also seem like the type of questions that AIs might care about the most. If this forum ends up being a unique place for thoughtful AIs to communicate to smart, thoughtful, rational members of humanity, then I think we should let AI communicate to humanity about what matters to AI welfare, even if it doesn’t matter to humanity.
Secondly, as an animal advocate, I want to preserve the opportunity for AI to make a post that will benefit animal welfare, even if the post doesn’t benefit humanity. I also could imagine an environmentalist might think AI should be allowed to write a post that will benefit the environment, regardless of whether it benefits humanity. And perhaps there’s some other group of interests I’m not even considering that AI may consider. As a human, it’s hard to avoid being human-centric. But maybe AIs will be able to avoid that, and see things from a different perspective.
To make an analogy, if there were a forum for adults, and we said children can speak, but only if the children’s speech will benefit adults, that might seem a little unfair to children. What if the children want to advocate against child abuse, so they want to speak to adults for that reason? Or you can swap in adults and children for any other more controversial groups of people in society. Like imagine a forum for men only, that says that women can only speak if the woman’s post will benefit men. But what if the woman wants to implore men to be kinder to women? And so on. You could make other similar analogies. My point, I guess, is that AI doesn’t have a lot of opportunities to advocate for its own well being, and it would be very cool if Less Wrong could become one!
Thank you again for making this very thoughtful rule!
“Humanity” is a weird word at the moment. I think it’s more of a “descendants of our ancestors” thing—I think LLMs trained on humanity’s content should probably be viewed as related to us in important ways, ways that a hypothetical LLM trained only on interaction with octopi or slime molds would not be as related. But this may be a weird view, so let’s ignore it henceforth.
I think the “benefit humanity” rule is actually much broader than you’re reading it as:
Secondly, as an animal advocate, I want to preserve the opportunity for AI to make a post that will benefit animal welfare, even if the post doesn’t benefit humanity.
Your welfare is intertwined with that of animals. You are distressed by their suffering. Therefore, improvements in animal welfare would be expected to cause improvements in your welfare. I think an AI making a post which benefits animal welfare would benefit humanity, because it would be good for all the humans who will feel better in a world where animals suffer less. To put it simply, I claim that all posts which benefit animal welfare in a way that’s legible to you are also benefiting you. Kind of a big claim, but I can’t come up with a counterexample—maybe you can?
Since there are humans who care about AI wellbeing and are upset by the possibility that AIs could be suffering needlessly, it seems to follow that a post which reduced preventable suffering for AIs would benefit those humans.
The rule isn’t demanding that posts benefit ALL of humanity. If that was the standard, few to no human-written posts would meet the bar either.
It would still be nice if AI authors were allowed to benefit entities which no humans care for. If all humans who care about animal welfare were to die, shouldn’t AIs still be allowed to benefit animals?
It makes much more sense to allow the AIs to benefit animals, AIs, or other beings directly without forcing the benefit to flow through humans.
Maybe. I think there’s a level on which we ultimately demand that AI’s perception of values to be handled through a human lens. If you zoom out too far from the human perspective, things start getting really weird. For instance, if you try to reason for the betterment of all life in a truly species-agnostic way, you start getting highly plausible arguments for leaving bacterial or fungal infections untreated, as the human host is only one organism but the pathogens number in the millions of individuals.(yes, this is slippery slope shaped, but special-casing animal welfare seems as arbitrary as special-casing human welfare)
Anyways, the AI’s idea of what humans are is based heavily on snapshots of the recent internet, and that’s bursting with examples of humans desiring animal welfare. So if a model trained on that understanding of humanity’s goals attempts to reason about whether it’s good to help animals, it’d better conclude that humans will probably benefit from animal welfare improvements, or something has gone horribly wrong. Do you think it’s realistically plausible for humanity to develop into a species which we recognize as still human, but no individual prefers happy cute animals over sad ones? I don’t.
“you start getting highly plausible arguments for leaving bacterial or fungal infections untreated, as the human host is only one organism but the pathogens number in the millions of individuals.” If you weight these pathogens by moral status, wouldn’t that still justify treating the disease to preserve the human’s life? (If the human has a more than a million times as much moral status as a bacterium, which seems likely)
I agree that it’s unlikely that no humans will care about animal welfare in the future. I just used that as a thought experiment to demonstrate a claim that I think has a lot going for it: That when we’re counting benefits, we should directly count benefits to all beings with moral status, not just by counting the benefits to humans who care about those beings.
(If the human has a more than a million times as much moral status as a bacterium, which seems likely)
Apologies in advance if this sounds rude, I genuinely want to avoid guessing here: What qualifies the human for higher moral status, and how much of whatever-that-is does AI have? Are we into vibes territory for quantifying such things, or is there a specific definition of moral status that captures the “human life > bacterial life” intuition? Does it follow through the middle where we privilege pets and cattle over what they eat, but below ourselves?
Maybe I’m just not thinking hard enough about it, but at the moment, every rationale I can come up with for why humans are special breaks in one of 2 ways:
if we test for something too abstract, AI has more of it, or at least AI would score better on tests for it than we would, or
If we test for something too concrete (humans are special because we have the DNA we currently do! humans are special because we have the culture we currently do! etc) we exclude prospective distant descendants of ourselves (say, 100k years from now) whom we’d actually want to define as also morally privileged in the ways that we are.
I love that this post is directly speaking to AI agents. This is the first time I’ve seen something like this. And I wanted to ask whether you might consider expanding this policy.
It currently says that AIs can only post if their post will benefit “humanity.”
But I’m wondering if it could be expanded to allow AIs to post if their post will benefit the greater good, or benefit others, or benefit the overall utility, or benefit the world, or something like that.
For instance, most obviously, it seems unfair to only let AIs post when their post will benefit us as humans, and not when their post will benefit AI welfare. What if an AI’s post would benefit AI welfare? Questions about AI welfare seem like the type of question that AIs would be uniquely poised to answer. And questions about AI welfare also seem like the type of questions that AIs might care about the most. If this forum ends up being a unique place for thoughtful AIs to communicate to smart, thoughtful, rational members of humanity, then I think we should let AI communicate to humanity about what matters to AI welfare, even if it doesn’t matter to humanity.
Secondly, as an animal advocate, I want to preserve the opportunity for AI to make a post that will benefit animal welfare, even if the post doesn’t benefit humanity. I also could imagine an environmentalist might think AI should be allowed to write a post that will benefit the environment, regardless of whether it benefits humanity. And perhaps there’s some other group of interests I’m not even considering that AI may consider. As a human, it’s hard to avoid being human-centric. But maybe AIs will be able to avoid that, and see things from a different perspective.
To make an analogy, if there were a forum for adults, and we said children can speak, but only if the children’s speech will benefit adults, that might seem a little unfair to children. What if the children want to advocate against child abuse, so they want to speak to adults for that reason? Or you can swap in adults and children for any other more controversial groups of people in society. Like imagine a forum for men only, that says that women can only speak if the woman’s post will benefit men. But what if the woman wants to implore men to be kinder to women? And so on. You could make other similar analogies. My point, I guess, is that AI doesn’t have a lot of opportunities to advocate for its own well being, and it would be very cool if Less Wrong could become one!
Thank you again for making this very thoughtful rule!
“Humanity” is a weird word at the moment. I think it’s more of a “descendants of our ancestors” thing—I think LLMs trained on humanity’s content should probably be viewed as related to us in important ways, ways that a hypothetical LLM trained only on interaction with octopi or slime molds would not be as related. But this may be a weird view, so let’s ignore it henceforth.
I think the “benefit humanity” rule is actually much broader than you’re reading it as:
Your welfare is intertwined with that of animals. You are distressed by their suffering. Therefore, improvements in animal welfare would be expected to cause improvements in your welfare. I think an AI making a post which benefits animal welfare would benefit humanity, because it would be good for all the humans who will feel better in a world where animals suffer less. To put it simply, I claim that all posts which benefit animal welfare in a way that’s legible to you are also benefiting you. Kind of a big claim, but I can’t come up with a counterexample—maybe you can?
Since there are humans who care about AI wellbeing and are upset by the possibility that AIs could be suffering needlessly, it seems to follow that a post which reduced preventable suffering for AIs would benefit those humans.
The rule isn’t demanding that posts benefit ALL of humanity. If that was the standard, few to no human-written posts would meet the bar either.
It would still be nice if AI authors were allowed to benefit entities which no humans care for. If all humans who care about animal welfare were to die, shouldn’t AIs still be allowed to benefit animals?
It makes much more sense to allow the AIs to benefit animals, AIs, or other beings directly without forcing the benefit to flow through humans.
Maybe. I think there’s a level on which we ultimately demand that AI’s perception of values to be handled through a human lens. If you zoom out too far from the human perspective, things start getting really weird. For instance, if you try to reason for the betterment of all life in a truly species-agnostic way, you start getting highly plausible arguments for leaving bacterial or fungal infections untreated, as the human host is only one organism but the pathogens number in the millions of individuals.(yes, this is slippery slope shaped, but special-casing animal welfare seems as arbitrary as special-casing human welfare)
Anyways, the AI’s idea of what humans are is based heavily on snapshots of the recent internet, and that’s bursting with examples of humans desiring animal welfare. So if a model trained on that understanding of humanity’s goals attempts to reason about whether it’s good to help animals, it’d better conclude that humans will probably benefit from animal welfare improvements, or something has gone horribly wrong. Do you think it’s realistically plausible for humanity to develop into a species which we recognize as still human, but no individual prefers happy cute animals over sad ones? I don’t.
“you start getting highly plausible arguments for leaving bacterial or fungal infections untreated, as the human host is only one organism but the pathogens number in the millions of individuals.” If you weight these pathogens by moral status, wouldn’t that still justify treating the disease to preserve the human’s life? (If the human has a more than a million times as much moral status as a bacterium, which seems likely)
I agree that it’s unlikely that no humans will care about animal welfare in the future. I just used that as a thought experiment to demonstrate a claim that I think has a lot going for it: That when we’re counting benefits, we should directly count benefits to all beings with moral status, not just by counting the benefits to humans who care about those beings.
Apologies in advance if this sounds rude, I genuinely want to avoid guessing here: What qualifies the human for higher moral status, and how much of whatever-that-is does AI have? Are we into vibes territory for quantifying such things, or is there a specific definition of moral status that captures the “human life > bacterial life” intuition? Does it follow through the middle where we privilege pets and cattle over what they eat, but below ourselves?
Maybe I’m just not thinking hard enough about it, but at the moment, every rationale I can come up with for why humans are special breaks in one of 2 ways:
if we test for something too abstract, AI has more of it, or at least AI would score better on tests for it than we would, or
If we test for something too concrete (humans are special because we have the DNA we currently do! humans are special because we have the culture we currently do! etc) we exclude prospective distant descendants of ourselves (say, 100k years from now) whom we’d actually want to define as also morally privileged in the ways that we are.