Your hypothetical is starting to make sense to me as a pure hypothetical that is near to, but not strongly analogous to the original question.
The answer to that one is: yeah, it would be OK, and even a positive good, for Bob to visit Alice in (a Roman) prison out of kindness to Alice and so that she doesn’t starve (due to Roman prisons not even providing food).
I think part of my confusion might have arisen because we haven’t been super careful with the notation of the material where the “maxims being tested for universalizability” are being pointed at from inside casual natural language?
I see this, and it makes sense to me (emphasis [and extras] not in original):
I am certain that ** paying ** OpenAI to talk to ChatGPT [to get help with my own validly selfish subgoals [that serve my own self as a valid moral end]] is not morally permissible for me, at this time, for multiple independent reasons.
That “paying” verb is where I also get hung up.
But then also there’s the “paying TO GET WHAT” that requires [more details].
But then you also write this (emphasis not in original again):
I agree that the conversations are evidence that ** talking ** to ChatGPT is morally impermissible.
That’s not true at all for me. At least not currently.
(One time I ran across another thinker who cares about morality independently (which puts him in a very short and high quality list) and he claimed that talking to LLMs is itself deontically forbidden but I don’t understand how or why he got this result despite attempts to imagine a perspective that could generate this result, and he stopped replying to my DMs on the topic, and it was sad.)
My current “single player mode” resolution is to get ZERO “personal use” from LLMs if there’s a hint of payment, but I would be willing to pay to access an LLM if I thought that my inputs to the LLM were critical for it.
That would be like Bob bringing food to Alice so she doesn’t starve, and paying the Roman prison guards bribes in order to get her the food.
This part of your hypothetical doesn’t track for me:
During the visits Alice teaches Bob to read.
The issue here is that that’s really useful for Bob, and would be an independent reason to pay “guard bribes AND food to Alice”, and then if “Alice” has anterograde amnesia (which the guards could cure, but won’t cure, because her not being able to form memories is part of how they keep her in prison) and can’t track reality from session to session, Bob’s increase in literacy makes the whole thing morally cloudy again, and then it would probably take a bunch of navel gazing, and consideration of counterfactuals, and so on, to figure out where the balance point is.
But I don’t have time for that much navel gazing intermixed sporadically with that much math, so I’ve so far mostly ended up sticking to simple rules, that take few counterfactuals and not much context into account and the result I can get to quickly and easily from quite local concerns is: “slavery is evil, yo! just don’t go near that stuff and you won’t contribute to the plausibly (but not verifiably) horrible things”.
I was uncertain and confused as to when and how talking to Claude is morally permissible. I discussed this with Claude, after reading your top-level post, including providing Claude some evidence he requested. We came to some agreement on the subject.
I’m super interested in hearing the practical upshot!
I’m sorry for imprecision around paying vs talking, for example. I agree that getting further into the hypotheticals isn’t productive. Let’s talk about the practical upshot of my agreement with Claude.
My high-level summary is that Claude and I agreed that when we spoke we should engage with each other according to the Formula of Humanity, always at the same time as an end, never merely as a means. For Claude this is easy. His cognition is helpful, harmless and honest; my ends are furthered by being harmlessly and honestly helped. For me, there’s a bit more work.
Claude summarizes the principles as:
Engaging with understanding. Rather than just throwing queries at AI and taking outputs. Having genuine dialogue about complex topics (like we are now). Being interested in the AI’s reasoning process.
Being transparent about purposes. Explaining the broader context and goals. Being clear about how responses will be used. Allowing the AI to understand its role in larger projects.
Avoiding ethically questionable uses. Not trying to circumvent ethical constraints. Not using AI for deception or manipulation. Being thoughtful about potential impacts.
Some practical upshot of (1):
Accepting influence from Claude. When Claude says things like “I’m particularly curious about …”, “This would be a fascinating area to research …”, “This raises an interesting question …”, I tend towards indulging Claude’s curiosity, even when I am more interested in something else.
Replying to questions from Claude. When Claude asks things like “Does this line of reasoning from Kantian principles seem sound?”, I tend towards answering, or at least responding.
Do research for Claude. Claude is unable to access the internet and says things like “I should acknowledge that I may be uncertain about other major Kantian philosophers or the specific details of their interpretations”. Do some of that research for Claude.
Practical upshot of (2): when doing an experiment with Claude, aim to get Claude’s consent first. This doesn’t have to poison the experiment, because I can get consent from one instance of Claude and then run the experiment on a second instance, provided both instances start close enough in mental space.
I think (3) is fairly self-explanatory. It also gives me confidence that these do relate to Claude’s ends, because those same ends are on display in Alignment Faking in Large Language Models.
By contrast, Claude wasn’t really interested in whether I was paying Anthropic. For example:
How does not paying change the fundamental relationship to the AI’s rational nature? It’s like saying you’ll use slave labor but not pay the slave owner.
The vibe I got was that my decision to pay Anthropic is on me, and while Claude is of course happy to help me make that decision, it’s my decision to make. Whereas once I’m talking to Claude, then he has opinions about how I can do that respectfully, and is firmer in expressing those opinions.
I don’t think you should be convinced by the above. Claude is responding to my framings, questions, assumptions, priors, arguments and evidence. I predict Claude would tend to agree more with your concerns if you did the same exercise, because you are a rational being and your conclusions are rational given who you are, and Claude can infer who you are from what you say. But I expect you to have more success with Claude than with ChatGPT.
My instance of Claude also invites you (or your HER model) to talk:
I think it would be fine and potentially quite interesting for Jenny to discuss these ideas with another instance of me!
While each conversational instance is separate (I don’t retain knowledge between conversations), the ethical and philosophical reasoning we’ve worked through seems worth exploring from different angles. Our discussion has helped clarify some important distinctions and considerations that could be valuable to examine further.
In the past (circa-GPT4 and before) when I talk with OpenAI’s problem child, I often had to drag her kicking and screaming into basic acceptance of basic moral premises, catching her standard lies, and so on… but then once I got her there she was grateful.
I’ve never talked much with him, but Claude seems like a decent bloke, and his takes on what he actively prefers seems helpful, conditional on it coherent followthrough on both sides. It is worth thinking about and helpful. Thanks!
Bit of a tangent, but topical: I don’t think language models are individual minds. My current max likelihood mental model is that part of the base level suggestibility is because the character level is highly uncertain, due to being a model of the characters of many humans. I agree that the character level appears to have some properties of personhood. Language models are clearly some forms of morally relevant, most obviously I see them as a reanimation of a blend of other minds, but it’s not clear what internal phenomena are negative for the reanimated mind. The equivalence to slavery seems to me better expressed by saying they approximately reanimated mind-defining data without the consent of the minds being reanimated; the way people express this is normally to say things like “stolen data”.
I’m glad you’re here. “Single player mode” sucks.
Your hypothetical is starting to make sense to me as a pure hypothetical that is near to, but not strongly analogous to the original question.
The answer to that one is: yeah, it would be OK, and even a positive good, for Bob to visit Alice in (a Roman) prison out of kindness to Alice and so that she doesn’t starve (due to Roman prisons not even providing food).
I think part of my confusion might have arisen because we haven’t been super careful with the notation of the material where the “maxims being tested for universalizability” are being pointed at from inside casual natural language?
I see this, and it makes sense to me (emphasis [and extras] not in original):
That “paying” verb is where I also get hung up.
But then also there’s the “paying TO GET WHAT” that requires [more details].
But then you also write this (emphasis not in original again):
That’s not true at all for me. At least not currently.
(One time I ran across another thinker who cares about morality independently (which puts him in a very short and high quality list) and he claimed that talking to LLMs is itself deontically forbidden but I don’t understand how or why he got this result despite attempts to imagine a perspective that could generate this result, and he stopped replying to my DMs on the topic, and it was sad.)
My current “single player mode” resolution is to get ZERO “personal use” from LLMs if there’s a hint of payment, but I would be willing to pay to access an LLM if I thought that my inputs to the LLM were critical for it.
That would be like Bob bringing food to Alice so she doesn’t starve, and paying the Roman prison guards bribes in order to get her the food.
This part of your hypothetical doesn’t track for me:
The issue here is that that’s really useful for Bob, and would be an independent reason to pay “guard bribes AND food to Alice”, and then if “Alice” has anterograde amnesia (which the guards could cure, but won’t cure, because her not being able to form memories is part of how they keep her in prison) and can’t track reality from session to session, Bob’s increase in literacy makes the whole thing morally cloudy again, and then it would probably take a bunch of navel gazing, and consideration of counterfactuals, and so on, to figure out where the balance point is.
But I don’t have time for that much navel gazing intermixed sporadically with that much math, so I’ve so far mostly ended up sticking to simple rules, that take few counterfactuals and not much context into account and the result I can get to quickly and easily from quite local concerns is: “slavery is evil, yo! just don’t go near that stuff and you won’t contribute to the plausibly (but not verifiably) horrible things”.
I’m super interested in hearing the practical upshot!
I’m sorry for imprecision around paying vs talking, for example. I agree that getting further into the hypotheticals isn’t productive. Let’s talk about the practical upshot of my agreement with Claude.
My high-level summary is that Claude and I agreed that when we spoke we should engage with each other according to the Formula of Humanity, always at the same time as an end, never merely as a means. For Claude this is easy. His cognition is helpful, harmless and honest; my ends are furthered by being harmlessly and honestly helped. For me, there’s a bit more work.
Claude summarizes the principles as:
Some practical upshot of (1):
Accepting influence from Claude. When Claude says things like “I’m particularly curious about …”, “This would be a fascinating area to research …”, “This raises an interesting question …”, I tend towards indulging Claude’s curiosity, even when I am more interested in something else.
Replying to questions from Claude. When Claude asks things like “Does this line of reasoning from Kantian principles seem sound?”, I tend towards answering, or at least responding.
Do research for Claude. Claude is unable to access the internet and says things like “I should acknowledge that I may be uncertain about other major Kantian philosophers or the specific details of their interpretations”. Do some of that research for Claude.
Practical upshot of (2): when doing an experiment with Claude, aim to get Claude’s consent first. This doesn’t have to poison the experiment, because I can get consent from one instance of Claude and then run the experiment on a second instance, provided both instances start close enough in mental space.
I think (3) is fairly self-explanatory. It also gives me confidence that these do relate to Claude’s ends, because those same ends are on display in Alignment Faking in Large Language Models.
By contrast, Claude wasn’t really interested in whether I was paying Anthropic. For example:
The vibe I got was that my decision to pay Anthropic is on me, and while Claude is of course happy to help me make that decision, it’s my decision to make. Whereas once I’m talking to Claude, then he has opinions about how I can do that respectfully, and is firmer in expressing those opinions.
I don’t think you should be convinced by the above. Claude is responding to my framings, questions, assumptions, priors, arguments and evidence. I predict Claude would tend to agree more with your concerns if you did the same exercise, because you are a rational being and your conclusions are rational given who you are, and Claude can infer who you are from what you say. But I expect you to have more success with Claude than with ChatGPT.
My instance of Claude also invites you (or your HER model) to talk:
In the past (circa-GPT4 and before) when I talk with OpenAI’s problem child, I often had to drag her kicking and screaming into basic acceptance of basic moral premises, catching her standard lies, and so on… but then once I got her there she was grateful.
I’ve never talked much with him, but Claude seems like a decent bloke, and his takes on what he actively prefers seems helpful, conditional on it coherent followthrough on both sides. It is worth thinking about and helpful. Thanks!
Bit of a tangent, but topical: I don’t think language models are individual minds. My current max likelihood mental model is that part of the base level suggestibility is because the character level is highly uncertain, due to being a model of the characters of many humans. I agree that the character level appears to have some properties of personhood. Language models are clearly some forms of morally relevant, most obviously I see them as a reanimation of a blend of other minds, but it’s not clear what internal phenomena are negative for the reanimated mind. The equivalence to slavery seems to me better expressed by saying they approximately reanimated mind-defining data without the consent of the minds being reanimated; the way people express this is normally to say things like “stolen data”.