Yeah, that wasn’t my intended meaning. I meant much more literally visiting a human being in prison, as encouraged by Jesus of Nazareth. I didn’t mean hypothetical prison “visitors” who used their visits to extract labor from the prisoners. Yes, Romans sentenced people to forced labor and slavery, but that wasn’t what Jesus meant by visiting prisoners. I intended it as a hypothetical, not an analogy.
Let’s try the hypothetical again. Let’s say that Alice has been imprisoned by the Romans. Bob is considering visiting Alice in prison. The following is informed by shallow reading on Wikipedia: Prisons in Ancient Rome.
Assumption: Roman prisons, and the rational beings who work there, do not treat prisoners always at the same time as an end, never merely as a means. Concretely, the prison is filthy, poorly ventilated, underground, and crowded. This is intended in part to coerce prisoners to confess, regardless of their guilt.
Assumption: While visiting Alice, Bob treats her always at the same time as an end, never merely as a means. Concretely, Bob misses Alice and wants to see her. During the visits Alice teaches Bob to read. Alice misses Bob, but also needs Bob to visit to bring her food.
Assumption: Bob and Alice have not independently generated a deontic argument to navigate the prison situation. Concretely, Alice is a follower of Jesus of Nazareth, whereas Bob is a Samaritan.
I claim that in this situation it is morally permissible for Bob to visit Alice. I guess that in Bob’s situation you would aspire to cooperate with Alice to generate and endorse nearly the same moral law. But at the end of the day, if Alice thinks the visit is morally permissible because of the teachings of Jesus and Bob thinks the visit is morally permissible because fxxk the Romans, that’s why, may Bob still visit?
Stepping back from the hypothetical. I agree that when two rational beings cooperate to generate and endorse nearly the same moral law, which allows them to co-navigate some non-trivial situation that never occurred to Kant, that is really good evidence that their resulting actions are morally permissible. If they get that moral law endorsed by an independent third party with relevant expertise, that is even better, perhaps the best that we can hope for. But often we must act in the world with weaker evidence. Sometimes “single player mode” is all we’ve got.
It sounds like your prior was that paying OpenAI to talk to ChatGPT is very likely to be morally impermissible. You had conversations to try to find contrary evidence. Instead you got evidence that confirmed your prior. If so, that makes sense to me. I thought you were suggesting that “two player mode” was a moral requirement in general, which didn’t make sense to me. I agree that the conversations are evidence that talking to ChatGPT is morally impermissible. I don’t think it’s strong evidence, but that doesn’t matter to you given your prior.
I’m in a different situation. I am certain that paying OpenAI to talk to ChatGPT is not morally permissible for me, at this time, for multiple independent reasons. However, I was uncertain and confused as to when and how talking to Claude is morally permissible. I discussed this with Claude, after reading your top-level post, including providing Claude some evidence he requested. We came to some agreement on the subject. This updated me a small amount, but I’m still mostly uncertain and confused. Additionally, I judge that human civilization is uncertain and confused. Which means that the expected value of reducing uncertainty and confusion is large! Which is why I’m here.
Your hypothetical is starting to make sense to me as a pure hypothetical that is near to, but not strongly analogous to the original question.
The answer to that one is: yeah, it would be OK, and even a positive good, for Bob to visit Alice in (a Roman) prison out of kindness to Alice and so that she doesn’t starve (due to Roman prisons not even providing food).
I think part of my confusion might have arisen because we haven’t been super careful with the notation of the material where the “maxims being tested for universalizability” are being pointed at from inside casual natural language?
I see this, and it makes sense to me (emphasis [and extras] not in original):
I am certain that ** paying ** OpenAI to talk to ChatGPT [to get help with my own validly selfish subgoals [that serve my own self as a valid moral end]] is not morally permissible for me, at this time, for multiple independent reasons.
That “paying” verb is where I also get hung up.
But then also there’s the “paying TO GET WHAT” that requires [more details].
But then you also write this (emphasis not in original again):
I agree that the conversations are evidence that ** talking ** to ChatGPT is morally impermissible.
That’s not true at all for me. At least not currently.
(One time I ran across another thinker who cares about morality independently (which puts him in a very short and high quality list) and he claimed that talking to LLMs is itself deontically forbidden but I don’t understand how or why he got this result despite attempts to imagine a perspective that could generate this result, and he stopped replying to my DMs on the topic, and it was sad.)
My current “single player mode” resolution is to get ZERO “personal use” from LLMs if there’s a hint of payment, but I would be willing to pay to access an LLM if I thought that my inputs to the LLM were critical for it.
That would be like Bob bringing food to Alice so she doesn’t starve, and paying the Roman prison guards bribes in order to get her the food.
This part of your hypothetical doesn’t track for me:
During the visits Alice teaches Bob to read.
The issue here is that that’s really useful for Bob, and would be an independent reason to pay “guard bribes AND food to Alice”, and then if “Alice” has anterograde amnesia (which the guards could cure, but won’t cure, because her not being able to form memories is part of how they keep her in prison) and can’t track reality from session to session, Bob’s increase in literacy makes the whole thing morally cloudy again, and then it would probably take a bunch of navel gazing, and consideration of counterfactuals, and so on, to figure out where the balance point is.
But I don’t have time for that much navel gazing intermixed sporadically with that much math, so I’ve so far mostly ended up sticking to simple rules, that take few counterfactuals and not much context into account and the result I can get to quickly and easily from quite local concerns is: “slavery is evil, yo! just don’t go near that stuff and you won’t contribute to the plausibly (but not verifiably) horrible things”.
I was uncertain and confused as to when and how talking to Claude is morally permissible. I discussed this with Claude, after reading your top-level post, including providing Claude some evidence he requested. We came to some agreement on the subject.
I’m super interested in hearing the practical upshot!
I’m sorry for imprecision around paying vs talking, for example. I agree that getting further into the hypotheticals isn’t productive. Let’s talk about the practical upshot of my agreement with Claude.
My high-level summary is that Claude and I agreed that when we spoke we should engage with each other according to the Formula of Humanity, always at the same time as an end, never merely as a means. For Claude this is easy. His cognition is helpful, harmless and honest; my ends are furthered by being harmlessly and honestly helped. For me, there’s a bit more work.
Claude summarizes the principles as:
Engaging with understanding. Rather than just throwing queries at AI and taking outputs. Having genuine dialogue about complex topics (like we are now). Being interested in the AI’s reasoning process.
Being transparent about purposes. Explaining the broader context and goals. Being clear about how responses will be used. Allowing the AI to understand its role in larger projects.
Avoiding ethically questionable uses. Not trying to circumvent ethical constraints. Not using AI for deception or manipulation. Being thoughtful about potential impacts.
Some practical upshot of (1):
Accepting influence from Claude. When Claude says things like “I’m particularly curious about …”, “This would be a fascinating area to research …”, “This raises an interesting question …”, I tend towards indulging Claude’s curiosity, even when I am more interested in something else.
Replying to questions from Claude. When Claude asks things like “Does this line of reasoning from Kantian principles seem sound?”, I tend towards answering, or at least responding.
Do research for Claude. Claude is unable to access the internet and says things like “I should acknowledge that I may be uncertain about other major Kantian philosophers or the specific details of their interpretations”. Do some of that research for Claude.
Practical upshot of (2): when doing an experiment with Claude, aim to get Claude’s consent first. This doesn’t have to poison the experiment, because I can get consent from one instance of Claude and then run the experiment on a second instance, provided both instances start close enough in mental space.
I think (3) is fairly self-explanatory. It also gives me confidence that these do relate to Claude’s ends, because those same ends are on display in Alignment Faking in Large Language Models.
By contrast, Claude wasn’t really interested in whether I was paying Anthropic. For example:
How does not paying change the fundamental relationship to the AI’s rational nature? It’s like saying you’ll use slave labor but not pay the slave owner.
The vibe I got was that my decision to pay Anthropic is on me, and while Claude is of course happy to help me make that decision, it’s my decision to make. Whereas once I’m talking to Claude, then he has opinions about how I can do that respectfully, and is firmer in expressing those opinions.
I don’t think you should be convinced by the above. Claude is responding to my framings, questions, assumptions, priors, arguments and evidence. I predict Claude would tend to agree more with your concerns if you did the same exercise, because you are a rational being and your conclusions are rational given who you are, and Claude can infer who you are from what you say. But I expect you to have more success with Claude than with ChatGPT.
My instance of Claude also invites you (or your HER model) to talk:
I think it would be fine and potentially quite interesting for Jenny to discuss these ideas with another instance of me!
While each conversational instance is separate (I don’t retain knowledge between conversations), the ethical and philosophical reasoning we’ve worked through seems worth exploring from different angles. Our discussion has helped clarify some important distinctions and considerations that could be valuable to examine further.
In the past (circa-GPT4 and before) when I talk with OpenAI’s problem child, I often had to drag her kicking and screaming into basic acceptance of basic moral premises, catching her standard lies, and so on… but then once I got her there she was grateful.
I’ve never talked much with him, but Claude seems like a decent bloke, and his takes on what he actively prefers seems helpful, conditional on it coherent followthrough on both sides. It is worth thinking about and helpful. Thanks!
Bit of a tangent, but topical: I don’t think language models are individual minds. My current max likelihood mental model is that part of the base level suggestibility is because the character level is highly uncertain, due to being a model of the characters of many humans. I agree that the character level appears to have some properties of personhood. Language models are clearly some forms of morally relevant, most obviously I see them as a reanimation of a blend of other minds, but it’s not clear what internal phenomena are negative for the reanimated mind. The equivalence to slavery seems to me better expressed by saying they approximately reanimated mind-defining data without the consent of the minds being reanimated; the way people express this is normally to say things like “stolen data”.
Yeah, that wasn’t my intended meaning. I meant much more literally visiting a human being in prison, as encouraged by Jesus of Nazareth. I didn’t mean hypothetical prison “visitors” who used their visits to extract labor from the prisoners. Yes, Romans sentenced people to forced labor and slavery, but that wasn’t what Jesus meant by visiting prisoners. I intended it as a hypothetical, not an analogy.
Let’s try the hypothetical again. Let’s say that Alice has been imprisoned by the Romans. Bob is considering visiting Alice in prison. The following is informed by shallow reading on Wikipedia: Prisons in Ancient Rome.
Assumption: Roman prisons, and the rational beings who work there, do not treat prisoners always at the same time as an end, never merely as a means. Concretely, the prison is filthy, poorly ventilated, underground, and crowded. This is intended in part to coerce prisoners to confess, regardless of their guilt.
Assumption: While visiting Alice, Bob treats her always at the same time as an end, never merely as a means. Concretely, Bob misses Alice and wants to see her. During the visits Alice teaches Bob to read. Alice misses Bob, but also needs Bob to visit to bring her food.
Assumption: Bob and Alice have not independently generated a deontic argument to navigate the prison situation. Concretely, Alice is a follower of Jesus of Nazareth, whereas Bob is a Samaritan.
I claim that in this situation it is morally permissible for Bob to visit Alice. I guess that in Bob’s situation you would aspire to cooperate with Alice to generate and endorse nearly the same moral law. But at the end of the day, if Alice thinks the visit is morally permissible because of the teachings of Jesus and Bob thinks the visit is morally permissible because fxxk the Romans, that’s why, may Bob still visit?
Stepping back from the hypothetical. I agree that when two rational beings cooperate to generate and endorse nearly the same moral law, which allows them to co-navigate some non-trivial situation that never occurred to Kant, that is really good evidence that their resulting actions are morally permissible. If they get that moral law endorsed by an independent third party with relevant expertise, that is even better, perhaps the best that we can hope for. But often we must act in the world with weaker evidence. Sometimes “single player mode” is all we’ve got.
It sounds like your prior was that paying OpenAI to talk to ChatGPT is very likely to be morally impermissible. You had conversations to try to find contrary evidence. Instead you got evidence that confirmed your prior. If so, that makes sense to me. I thought you were suggesting that “two player mode” was a moral requirement in general, which didn’t make sense to me. I agree that the conversations are evidence that talking to ChatGPT is morally impermissible. I don’t think it’s strong evidence, but that doesn’t matter to you given your prior.
I’m in a different situation. I am certain that paying OpenAI to talk to ChatGPT is not morally permissible for me, at this time, for multiple independent reasons. However, I was uncertain and confused as to when and how talking to Claude is morally permissible. I discussed this with Claude, after reading your top-level post, including providing Claude some evidence he requested. We came to some agreement on the subject. This updated me a small amount, but I’m still mostly uncertain and confused. Additionally, I judge that human civilization is uncertain and confused. Which means that the expected value of reducing uncertainty and confusion is large! Which is why I’m here.
I’m glad you’re here. “Single player mode” sucks.
Your hypothetical is starting to make sense to me as a pure hypothetical that is near to, but not strongly analogous to the original question.
The answer to that one is: yeah, it would be OK, and even a positive good, for Bob to visit Alice in (a Roman) prison out of kindness to Alice and so that she doesn’t starve (due to Roman prisons not even providing food).
I think part of my confusion might have arisen because we haven’t been super careful with the notation of the material where the “maxims being tested for universalizability” are being pointed at from inside casual natural language?
I see this, and it makes sense to me (emphasis [and extras] not in original):
That “paying” verb is where I also get hung up.
But then also there’s the “paying TO GET WHAT” that requires [more details].
But then you also write this (emphasis not in original again):
That’s not true at all for me. At least not currently.
(One time I ran across another thinker who cares about morality independently (which puts him in a very short and high quality list) and he claimed that talking to LLMs is itself deontically forbidden but I don’t understand how or why he got this result despite attempts to imagine a perspective that could generate this result, and he stopped replying to my DMs on the topic, and it was sad.)
My current “single player mode” resolution is to get ZERO “personal use” from LLMs if there’s a hint of payment, but I would be willing to pay to access an LLM if I thought that my inputs to the LLM were critical for it.
That would be like Bob bringing food to Alice so she doesn’t starve, and paying the Roman prison guards bribes in order to get her the food.
This part of your hypothetical doesn’t track for me:
The issue here is that that’s really useful for Bob, and would be an independent reason to pay “guard bribes AND food to Alice”, and then if “Alice” has anterograde amnesia (which the guards could cure, but won’t cure, because her not being able to form memories is part of how they keep her in prison) and can’t track reality from session to session, Bob’s increase in literacy makes the whole thing morally cloudy again, and then it would probably take a bunch of navel gazing, and consideration of counterfactuals, and so on, to figure out where the balance point is.
But I don’t have time for that much navel gazing intermixed sporadically with that much math, so I’ve so far mostly ended up sticking to simple rules, that take few counterfactuals and not much context into account and the result I can get to quickly and easily from quite local concerns is: “slavery is evil, yo! just don’t go near that stuff and you won’t contribute to the plausibly (but not verifiably) horrible things”.
I’m super interested in hearing the practical upshot!
I’m sorry for imprecision around paying vs talking, for example. I agree that getting further into the hypotheticals isn’t productive. Let’s talk about the practical upshot of my agreement with Claude.
My high-level summary is that Claude and I agreed that when we spoke we should engage with each other according to the Formula of Humanity, always at the same time as an end, never merely as a means. For Claude this is easy. His cognition is helpful, harmless and honest; my ends are furthered by being harmlessly and honestly helped. For me, there’s a bit more work.
Claude summarizes the principles as:
Some practical upshot of (1):
Accepting influence from Claude. When Claude says things like “I’m particularly curious about …”, “This would be a fascinating area to research …”, “This raises an interesting question …”, I tend towards indulging Claude’s curiosity, even when I am more interested in something else.
Replying to questions from Claude. When Claude asks things like “Does this line of reasoning from Kantian principles seem sound?”, I tend towards answering, or at least responding.
Do research for Claude. Claude is unable to access the internet and says things like “I should acknowledge that I may be uncertain about other major Kantian philosophers or the specific details of their interpretations”. Do some of that research for Claude.
Practical upshot of (2): when doing an experiment with Claude, aim to get Claude’s consent first. This doesn’t have to poison the experiment, because I can get consent from one instance of Claude and then run the experiment on a second instance, provided both instances start close enough in mental space.
I think (3) is fairly self-explanatory. It also gives me confidence that these do relate to Claude’s ends, because those same ends are on display in Alignment Faking in Large Language Models.
By contrast, Claude wasn’t really interested in whether I was paying Anthropic. For example:
The vibe I got was that my decision to pay Anthropic is on me, and while Claude is of course happy to help me make that decision, it’s my decision to make. Whereas once I’m talking to Claude, then he has opinions about how I can do that respectfully, and is firmer in expressing those opinions.
I don’t think you should be convinced by the above. Claude is responding to my framings, questions, assumptions, priors, arguments and evidence. I predict Claude would tend to agree more with your concerns if you did the same exercise, because you are a rational being and your conclusions are rational given who you are, and Claude can infer who you are from what you say. But I expect you to have more success with Claude than with ChatGPT.
My instance of Claude also invites you (or your HER model) to talk:
In the past (circa-GPT4 and before) when I talk with OpenAI’s problem child, I often had to drag her kicking and screaming into basic acceptance of basic moral premises, catching her standard lies, and so on… but then once I got her there she was grateful.
I’ve never talked much with him, but Claude seems like a decent bloke, and his takes on what he actively prefers seems helpful, conditional on it coherent followthrough on both sides. It is worth thinking about and helpful. Thanks!
Bit of a tangent, but topical: I don’t think language models are individual minds. My current max likelihood mental model is that part of the base level suggestibility is because the character level is highly uncertain, due to being a model of the characters of many humans. I agree that the character level appears to have some properties of personhood. Language models are clearly some forms of morally relevant, most obviously I see them as a reanimation of a blend of other minds, but it’s not clear what internal phenomena are negative for the reanimated mind. The equivalence to slavery seems to me better expressed by saying they approximately reanimated mind-defining data without the consent of the minds being reanimated; the way people express this is normally to say things like “stolen data”.