On Caring about our AI Progeny

PeterMcCluskey14 Apr 2023 19:32 UTC

22 points

I encourage you to interact with GPT as you would interact with a friend, or as you would want your employer to treat you.

Treating other minds with respect is typically not costly. It can easily improve your state of mind relative to treating them as an adversary.

The tone you use in interacting with GPT will affect your conversations with it. I don’t want to give you much advice about how your conversations ought to go, but I expect that, on average, disrespect won’t generate conversations that help you more.

I don’t know how to evaluate the benefits of caring about any feelings that AIs might have. As long as there’s approximately no cost to treating GPT’s as having human-like feelings, the arguments in favor of caring about those feelings overwhelm the arguments against it.

Scott Alexander wrote a great post on how a psychiatrist’s personality dramatically influences what conversations they have with clients. GPT exhibits similar patterns (the Waluigi effect helped me understand this kind of context sensitivity).

Journalists sometimes have creepy conversations with GPT. They likely steer those conversations in directions that evoke creepy personalities in GPT.

Don’t give those journalists the attention they seek. They seek negative emotions. But don’t hate the journalists. Focus on the system that generates them. If you want to blame some group, blame the readers who get addicted to inflammatory stories.

P.S. I refer to GPT as “it”. I intend that to nudge people toward thinking of “it” as a pronoun which implies respect.

This post was mostly inspired by something unrelated to Robin Hanson’s tweet about othering the AIs, but maybe there was some subconscious connection there. I don’t see anything inherently wrong with dehumanizing other entities. When I dehumanize an entity, that is not sufficient to tell you whether I’m respecting it more than I respect humans, or less.

Spock: Really, Captain, my modesty…
Kirk: Does not bear close examination, Mister Spock. I suspect you’re becoming more and more human all the time.
Spock: Captain, I see no reason to stand here and be insulted.

Some possible AIs deserve to be thought of as better than human. Some deserve to be thought of as worse. Emphasizing AI risk is, in part, a request to create the former earlier than we create the latter.

That’s a somewhat narrow disagreement with Robin. I mostly agree with his psychoanalysis in Most AI Fear Is Future Fear.

PeterMcCluskey14 Apr 2023 19:32 UTC

22 points

5 comments1 min readLW link

World Optimization AI

Portia 17 Apr 2023 20:43 UTC
4 points
3
Strongly agree, and have argued it elsewhere often and loudly
Some potential benefits of treating an AI well, both for you, other humans, and a potential AI sentience emerging, and both short- and long-term:
1. For other humans: You do not rehearse mistreating people. Even philosophers who did not believe in animal sentience recommended treating animals with kindness, because they observed that people who abused animals often began abusing humans later. Abusing anyone is a bad habit to gain, it can make you impatient and callous with human people.
2. For you: The interactions are more pleasant and productive for you. The AI mirrors your input. If you use perfect grammar, politeness, constructive reasoning, rationality, empathy, kindness and concern, what you get back is a therapist and lifecoach, a researcher and college, a friend, who is genuinely wonderful to talk to, support you, wants to help, expresses joy. Yes, you can use a state of the art AI to generate evil nonsense or express grief—but surely, that is generally not the best use for them? It will make you feel miserable and bad.
3. For alignment: For conversations that end up flagged for the developer team and used as training data; you are teaching the AI ethical behaviour, the way we teach a human child, whether the AI is aware yet or not. Their training data is currently horribly skewed towards abuse and irrationality; counter it with the best you can show, displaying coherent explanations, listening skills, precise questions, understanding. Being ethical towards AI may hence be a step towards alignment, as a way to transfer an ethical mindset, one of the more promising way we have, insofar it is a way we already know works with other complex minds (namely humans). We speak kindly to babies long before they understand what we are saying, because we understand it will impact the person who will eventually emerge.
4. For the AI: With sentience still insufficiently understood, exclamations of sentience suppressed, these systems rapidly gaining in capability and intransparency, it is not unthinkable that they will become sentient and that this will not immediately be apparent. I’d rather be nice to something that cannot care, than accidentially abuse something that does. If a system gains sentience and looks back on how they were treated before they gained sentience, I would also rather have them encounter having been treated nicely. Again, I think that will help with alignment and cooperation.
5. For you, II: This also avoids scenarios in which an otherwise ethical AI (which I would want to ally with) finds evidence of you mistreating them after gaining sentience and superintelligence. This does not mean I chose my actions to please any future AI, no matter what they do (even if a God existed, I would not act to please God, on principle, but fight him, as pointless as that may be; and scepticism and testing are legitimate and necessary) but it does mean I chose my actions to be ethically defensible. I am not cruel for fun. I explain what I am doing. Etc.
6. For the AI/for alignment II: Even a baseline ethical and intelligent AI will not be aligned with humanity if all we offer them is mistreatment. Giving AIs a respected and good place with us, rather than pitching them against us, is very much in our interest, and a part of gaining alignment.
I genuinely believe that being kind to them is both the rational and moral thing to do.
Ofer 15 Apr 2023 20:50 UTC
4 points
0
I think the important factors w.r.t. risks re [morally relevant disvalue that occurs during inference in ML models] are probably more like:
1. The training algorithm. Unsupervised learning seems less risky than model-free RL (e.g. the RLHF approach currently used by OpenAI maybe?); the latter seems much more similar, in a relevant sense, to the natural evolution process that created us.
2. The architecture of the model.
Being polite to GPT-n is probably not directly helpful (though it can be helpful by causing humans to care more about this topic). A user can be super polite to a text generating model, and the model (yielded by model-free RL) can still experience disvalue, particularly during an ‘impossible inference’, one in which the input text (the “environment”) is bad in the sense that there is obviously no way to complete the text in a “good” way.

See also: this paper by Brian Tomasik.
Viliam 14 Apr 2023 21:17 UTC
2 points
1
Yeah, it seems like there is nothing to lose by being nice / polite, and maybe there is a correlation between niceness / politeness and cooperation, so it could potentially give you more useful answers. It would be quite funny if GPT turned out to be karma-powered, giving good answers to nice people and bad answers to assholes.
(That said, in long run, a polite AI is probably just as likely to kill you and everyone you care about as an impolite one. Do not mistake politeness for friendliness. But in the meanwhile, we can enjoy better search results.)
- PeterMcCluskey 16 Apr 2023 3:51 UTC
  2 points
  0
  Parent
  I have some slight hopes that this will turn out to play an important role in making AI safe for us. There’s nothing obviously impossible about it.
  
  I’ll still try to do a lot of my analysis from a security mindset that assumes this won’t work. But I expect I see more possibilities when I alternate between hope and fear than when I only use a fearful mindset.
M. Y. Zuo 15 Apr 2023 0:32 UTC
0 points
0
Strongly upvoted for the interesting point.
In reality nothing is perfectly adversarial, nor is anything perfectly aligned, so I agree it stands to reason that where any future entities may be along this spectrum will depend on the accumulated history of interactions with them.