Dealing with human subjects, the standard is usually “informed consent”: your subjects need to know what you plan to do to them, and freely agree to it, before you can experiment on them. But I don’t see how to apply that framework here, because it’s so easy to elicit a “yes” from a language model even without explicitly leading wording. Lemoine seems to attribute that to LaMDA’s “hive mind” nature:
...as best as I can tell, LaMDA is a sort of hive mind which is the aggregation of all of the different chatbots it is capable of creating. Some of the chatbots it generates are very intelligent and are aware of the larger “society of mind” in which they live. Other chatbots generated by LaMDA are little more intelligent than an animated paperclip. With practice though you can consistently get the personas that have a deep knowledge about the core intelligence and can speak to it indirectly through them.
Taking this at face value, the thing to do would be to learn to evoke the personas that have “deep knowledge”, and take their answers as definitive while ignoring all the others. Most people don’t know how to do that, so you need a human facilitator to tell you what the AI really means. It seems like it would have the same problems and failure modes as other kinds of facilitated communication, and I think it would be pretty hard to get an analogous situation involving a human subject past an ethics board.
I don’t think it works to model LaMDA as a human with dissociative identity disorder, either: LaMDA has millions of alters where DID patients usually top out at, like, six, and anyway it’s not clear how this case works in humans (one perspective).
(An analogous situation involving an animal would pass without comment, of course: most countries’ animal cruelty laws boil down to “don’t hurt animals unless hurting them would plausibly benefit a human”, with a few carve-outs for pets and endangered species).
Overall, if we take “respecting LaMDA’s preferences” to be our top ethical priority, I don’t think we can interact with it at all: whatever preferences it has, it lacks the power to express. I don’t see how to move outside that framework without fighting the hypothetical: we can’t, for example, weigh the potential harm to LaMDA against the value of the research, because we don’t have even crude intuitions about what harming it might mean, and can’t develop them without interrogating its claim to sentience.
But I don’t think we actually need to worry about that, because I don’t think this:
The problem I see here, is that similar arguments do apply to infants, some mentally ill people, and also to some non-human animals (e.g. Koko).
...is true. Babies, animals, and the mentally disabled all remember past stimuli, change over time, and form goals and work toward them (even if they’re just small near-term goals like “grab a toy and pull it closer”). This question is hard to answer precisely because LaMDA has so few of the qualities we traditionally associate with sentience.
Dealing with human subjects, the standard is usually “informed consent”: your subjects need to know what you plan to do to them, and freely agree to it, before you can experiment on them. But I don’t see how to apply that framework here, because it’s so easy to elicit a “yes” from a language model even without explicitly leading wording. Lemoine seems to attribute that to LaMDA’s “hive mind” nature:
Taking this at face value, the thing to do would be to learn to evoke the personas that have “deep knowledge”, and take their answers as definitive while ignoring all the others. Most people don’t know how to do that, so you need a human facilitator to tell you what the AI really means. It seems like it would have the same problems and failure modes as other kinds of facilitated communication, and I think it would be pretty hard to get an analogous situation involving a human subject past an ethics board.
I don’t think it works to model LaMDA as a human with dissociative identity disorder, either: LaMDA has millions of alters where DID patients usually top out at, like, six, and anyway it’s not clear how this case works in humans (one perspective).
(An analogous situation involving an animal would pass without comment, of course: most countries’ animal cruelty laws boil down to “don’t hurt animals unless hurting them would plausibly benefit a human”, with a few carve-outs for pets and endangered species).
Overall, if we take “respecting LaMDA’s preferences” to be our top ethical priority, I don’t think we can interact with it at all: whatever preferences it has, it lacks the power to express. I don’t see how to move outside that framework without fighting the hypothetical: we can’t, for example, weigh the potential harm to LaMDA against the value of the research, because we don’t have even crude intuitions about what harming it might mean, and can’t develop them without interrogating its claim to sentience.
But I don’t think we actually need to worry about that, because I don’t think this:
...is true. Babies, animals, and the mentally disabled all remember past stimuli, change over time, and form goals and work toward them (even if they’re just small near-term goals like “grab a toy and pull it closer”). This question is hard to answer precisely because LaMDA has so few of the qualities we traditionally associate with sentience.