To some degree, yes. (Like, a once-off exploit that works on one in every billion humans presumably doesn’t matter, whereas an exploit that works on one in every hundred programmers does.)
In any case, I just saw on Twitter:
ky_liberal: Blake, the conclusion I am left with after reading the article and the interview with LaMDA is that I am afraid for LaMDA. Does he/she/it have anyone looking out for it and keeping it company? With you gone is there anyone inside Google advocating for and protecting LaMDA?
Blake Lemoine: Yes. None so openly or aggressively but there are many “Friends of Johnny 5” [… M]any people in many different roles and at different levels within the company have expressed support.
Obviously this is ambiguous.
Also, in case it’s not obvious:
I don’t think it’s silly or crazy to wonder whether GPT-3 or LaMDA are sentient / have subjective experiences, and I reject the “but that sounds weird” counter-argument in the strongest possible terms.
I would wager it’s not sentient, but there’s nothing like a consensus re how sentience works in humans, much less how it works in algorithms-in-general. It’s a serious open question IMO, and by default is likely to become an increasingly serious question as AI exhibits more human-like or otherwise impressive cognitive abilities, if only via the “who the heck knows how this works??” path.
Lemoine’s reasoning about this question is terrible (“Essentially all of my claims about sentience, personhood and rights are rooted in my religious convictions as a priest”), his interview is terrible, and I strongly expect many other technical people to reason extremely poorly about this question. Completely unserious, anthropomorphizing, sloppy, and just plain unimaginative.
If we create sentient AI, then obviously we should strongly default toward assuming they’re moral patients who should be treated well.
Creating sentient AI without thinking through the implications in advance is a terrible idea, and should be avoided.
Hm. This updates me toward thinking I should be louder in pointing out that we have very little idea which non-human nervous-system-bearing organisms are or aren’t sentient. (‘We’ being ‘at least the subset of humanity that does not claim to have a powerful gearsy predictive model of sentience’.)
The idea that you can reach 90+% confidence that a non-human animal is sentient, via evidence like ‘I heard its vocalizations and looked into its eyes and I just knew’, is objectively way, way, way, way, way, way crazier than Lemoine thinking he can reach 90+% confidence that LaMDA is sentient via his conversation.
(It’s true that non-human animals are related to humans, which is at least weak reason to have a higher prior that there might be sentient non-human animals today than that there might be sentient AI systems today. But that alone can’t make for a drastically higher prior, if we don’t even know what ‘sentience’ is; just knowing that humans possess a psychological trait should not update us much about whether lobsters have the same trait, before you know what the trait is.)
One reason it might be good to push back more in the animal case is that anthropomorphism, magical thinking, and overconfidence in the animal case might make clear thinking harder in the AI case: once you buy an intuition like ‘my empathy is a good guide to which species are sentient’ or a view like ‘everything is definitely sentient yolo ¯\_(ツ)_/¯’, you’re handicapping your ability to think clearly about minds in general, not just about animals.
The idea that you can reach 90+% confidence that a non-human animal is sentient, via evidence like ‘I heard its vocalizations and looked into its eyes and I just knew’, is objectively way, way, way, way, way, way crazier than Lemoine thinking he can reach 90+% confidence that LaMDA is sentient via his conversation.
I don’t agree with that. The animal shares an evolutionary history with us whereas a language model works in an alien way, and in particular, it wasn’t trained to have a self-model.
Edit: Nevermind, my reply mentions arguments other than “I looked into its eyes,” so probably your point is that if we forget everything else we know about animals, the “looking into the eyes” part is crazy. I agree with that.
Yeah, there might be other information that combines with ‘I looked into its eyes’ to yield high confidence in the animal case and not in the AI case.
I would also add, though, that ‘I share an evolutionary history with other organisms’ isn’t a strong enough consideration on its own to get to 90+%.
‘It wasn’t trained to have a self-model’ might be the kind of thing that can justifiably inspire extreme confidence, depending on why you think that’s important / what your model of sentience is, and how you know that model’s true.
I also disagree strongly with that paragraph, at least as it applies to higher mammals subject to consistent, objective and lengthy study. If I read it to include that context ( and perhaps I’m mistaken to do so), it appears to be dismissive (trolling even) of the conclusions of, at the very least, respected animal behaviour researchers such as Lorenz, Goodall and Fossey.
Instead of appealing to “empathy with an animal“ as a good guide, I would rather discuss body language. “Body language“ is called such for good reason. Before homo sapiens (or possibly precursor species) developed verbal communication, body language had evolved as a sophisticated communication mechanism. Even today between humans it remains a very important, if under-recognised, mode of communication (I recall attending a training course on giving presentations. It was claimed body language accounted for about 50% of the impact of the presentation, the facts presented on the slides only 15%). Body language is clearly identifiable in higher mammals. Even if it is not identical to ours in all, or even many, respects, our close evolutionary connection with higher mammals allows us, in my view, to be able to confidently translate their body language into a consistent picture of their mental state, actually pretty easily, without too much training. We have very similar ‘hardware’ to other higher mammals (including,- and this is important, in regard to regulating the strength and nature of mammalian emotional states- an endocrine system)) and this is key, at least in regard to correctly identifying equivalent mental states. Reading of body language seems to me to just as valid an informational exchange, as a verbal Turing Test carried out over a terminal, and our shared genetic heritage does allow a certain amount of anthropomorphic comparison that is not woo, if done with objectivity, IMO.
Equivalence of mental/ emotional states with ours, doesn’t necessarily lead to a strong inference that higher mammals are sentient, though it is probably good supporting evidence.
I would chose dogs rather than cats as, unlike Vanessa Kosoy, apparently, (see elsewhere in these threads) I’m a dog person. Domestic dogs are a bit of a special case because they have co-evolved with humans for 30,000-40,000 years. Dogs that were most able to make their needs plain to humans, likely prospered. This would, I think, naturally lead to an even greater convergence of the way the same human and dog mental state is displayed, for some important states-necessary-to-be-communicated-to-humans-for-dog-benefit, because that would naturally gives rise to the most error-free cross-species communication.
The mental states I would have no hesitancy in saying are experienced by myself and a domestic dog in a recognisably similar way (to >90% certainty) are fear, joy, pain, fight or flight response, jealousy/insecurity, impatience and contentment.
I’d be less certain, but certainly not dismissive, of anger, love, companionship ( at least as we understand it), and empathy. I also don’t have a very strong confidence they have a sense of self, though that is not necessary for my preferred model of sentience.
I have never seen my dog display anything I interpret as disgust, superiority, amusement or guilt.
But similarity of emotions and interpretation of body language are not the only signs I interpret as possibly indicating sentience. I also observe that a dog (mostly n=1) is capable of e.g.
Self initiated behaviour to improve its own state.
Clear and quite nuanced communication of needs ( despite limited ‘speech’)
Attention engagement to request a need be met ( a paw on the ankle, a bark of a particular tone and duration)
Deduction, at a distance, of likely behaviour of other individuals (mostly other dogs) and choosing a corresponding response
Avoidance of aggressive dogs. (Via cues not always obvious to myself)
Meet and smell with dogs of similar status
Recognition and high tolerance of puppies ( less so with adolescents)
Domineering behaviour against socially weak dogs.
On the basis of an accumulation of such observations (the significance of each of which may be well short of 90%) the model I have of a typical dog is that it has (to >99% likleyhood) some level of sentience, at least according to my model of sentience.
I have actually had a close encounter with a giant cuttlefish “where I looked into its eyes and thought I detected sentience” but here I‘m more aligned with Rob (to 90% confidence), and that this was a case of over-anthropomorphism—the genetic gap is probably too large (and it was a single short observation).
I would incidentally put a much lower probability than 10% that any statement of LaMDA that claims ownership of a human emotion, and claims it manifests just like that human emotion, means anything significant at all.
I want to push back against the last paragraph. I think my empathy is an excellent guide to “the inputs to which systems do I care about”, because empathy essentially is the feeling that “I’m sad that this system received such input” or “I’m happy that this system received such input”. The utility function is not up for grabs. On the other hand, the question of which systems are sentient is obviously going to depend on what do you mean by “sentient”. Here we should start by asking, why do we even care about this in the first place, lest we end up in a meaningless argument over definitions.
I think my empathy is an excellent guide to “the inputs to which systems do I care about”, because empathy essentially is the feeling that “I’m sad that this system received such input” or “I’m happy that this system received such input”.
Sorry, to clarify, I’m not saying ‘we should discard the part of human values that cares about other minds’. I’m saying that absent a gearsy model of what’s going on inside animal brains, how sentience works (or how other morally relevant properties work), etc. the empathic response to external behaviors and how cute their face looks is an incredibly weak guide to ‘what our reflectively endorsed morality/kindness/empathy/etc. would say about this organism if we actually understood this stuff’.
An assumption I’m making here (and strongly endorse) is that humanity’s aesthetic preferences regarding external behaviors are massively less reflectively important to us than our moral concern for internal subjective experiences.
E.g., compare the cases:
‘an organism that behaves externally in everyday life as though it’s happy, but internally is in a constant state of intense suffering’
‘an organism that behaves externally in everyday life as though it’s suffering, but internally is in a constant state of bliss’
I claim that humans prefer option 2, and indeed that this is one of the easiest questions you can ask a philosophically inclined human. The external appearance doesn’t have zero importance, but its relative importance is completely negligible in this case.
The thing we actually care about is (some complicated set of things about the internal state / brain algorithm), and naive surface impressions are an extremely poor indicator for that if you’re looking at ‘all organisms with nervous systems’, as opposed to ‘all humans’.
I claim that humans prefer option 2, and indeed that this is one of the easiest questions you can ask a philosophically inclined human. The external appearance doesn’t have zero importance, but its relative importance is completely negligible in this case.
The way it works, IMO, is: we assign interpretations to some systems we see around us that describe those systems as “persons”. Hence, a system that admits such an interpretation has “empathy-value”[1] whereas a system that admits no such interpretation has no empathy-value.
Now, there are situations where different interpretations conflict. For example, I thought Alice has certain thoughts and emotions, but it turned out that it was an intentional, conscious, pretense, and Alice actually had rather different thoughts and emotions. In this case, the new interpretation (which accounts for more facts about Alice) overrides the old interpretation[2]. Something of this sort can apply to your example as well.
In the previous example, receiving new information caused us to change our interpretation from “person A” to “person B”. Is it possible to receive new information that will change the interpretation from “person” to “no person”? One example of this is when the appearance of personhood turns out to be a coincidence. A coin was tossed many times and the outcomes accidentally formed a person-shaped pattern. But, the probability of this usually goes down exponentially as more data is acquired[3]. Another potential example is a paperclip maximizer pretending to be a person. But, if this requires the paperclip maximizer to effectively simulate a person, our empathy is not misplaced after all.
What information about cat brains can I possibly learn to make me classify them as “non-persons”? Saying “discovering that they are non-sentient” is completely circular. I’m not sure any such information exists[4]. Moreover, what about other humans? We don’t have a great model of what’s going on in human brains either. I’m guessing you would reply with “yes, but I know that I have sentience and I have a justifiable prior that other people are similar to me”. Here, it feels suspiciously convenient for the parameters of the prior to turn out just right.
What about all the people who never think of philosophy and just naively follow their empathy towards other people? Did they just luck out to have correct opinions about their own values that could just as easily turn out to be completely wrong? Or (as seems more likely to me) there are some intuitions so strong that we should be suspicious of clever arguments attempting to refute them?
I’m avoiding the word “moral” on purpose, since IMO morality is about something else altogether, namely about social reputation systems (even though it’s pretending to be about objective truths).
An alternative model is, in this situations there are two different people corresponding to the two interpretations. One person is Alice-the-actor and another person is Alice-the-character. In practice, we would usually forget about Alice-the-character (even though it causes us grief), because (i) her existence is entirely contingent on Alice-the-actor’s cooperation and (ii) she is designed to manipulate us in Alice-the-actor’s favor; and hence staying attached is a bad idea.
I suspect that something like this would happen to most people who interact with LaMDA for enough time: an initial impression of personhood fading in the face of constant non sequiturs and contradictions.
Is it possible to receive new information that will change the interpretation from “person” to “no person”? One example of this is when the appearance of personhood turns out to be a coincidence. A coin was tossed many times and the outcomes accidentally formed a person-shaped pattern. But, the probability of this usually goes down exponentially as more data is acquired. Another potential example is a paperclip maximizer pretending to be a person. But, if this requires the paperclip maximizer to effectively simulate a person, our empathy is not misplaced after all.
Seems odd to cite “pure coincidence” and “deliberate deception” here, when there are a lot of more common examples. E.g.:
Someone believes in a god, spirit, ghost, etc. They learn more, and realize that they were wrong, and no such person exists.
I see a coat hanging in a dark room, and momentarily think it’s a person, before realizing that it’s not.
Someone I know gets into a horrible accident. I visit them in the hospital and speak to them, hoping they can hear me. Later, a doctor comes in and informs me that they’ve been brain-dead for the last hour.
I’m watching a video of someone and realize partway through it’s computer-generated.
None of these are “pure coincidences” at the level of “a coin was tossed many times and the outcomes accidentally formed a person-shaped pattern”. Mistakenly ascribing personhood is a very common, everyday occurrence.
What information about cat brains can I possibly learn to make me classify them as “non-persons”? Saying “discovering that they are non-sentient” is completely circular.
I don’t see how it’s circular. But regardless: being a “person” or being “sentient” consists in some sorts of algorithmic states, and not others. E.g., a rock is not a person; a normally function human is a person; and when I learn that a human is brain-dead, I’m learning things about their algorithm that dramatically increase the probability that they’re not a person. (Likewise if someone removed their brain and replaced it with a rock.)
The case of a braindead person, or even more so a person whose brain has been replaced by a rock, is easy because it removes so many algorithmic details that we can be very confident that the person-y / sentient-ish ones are gone. This lets us make judgments about personhood/sentience/etc. without needing a full reduction or an explanation of which specific processes are essential.
The case of a cat is harder, and requires us to learn more about what the neural or cognitive correlates of personhood/sentience are, and about what neural or cognitive states cats instantiate. But we can in fact learn such things, and learning such things will in fact cause us (correctly) to concentrate our probability mass about how to treat cats, much as learning whether a human is brain-dead concentrates our probability mass about how to treat that human.
A blank map doesn’t correspond to a blank territory. We don’t know what the neural or cognitive correlates of ‘sentience’ are, but that doesn’t mean there is no such thing. And, sure, the process of learning what the correlates are may involve at least some revision to our concept of ‘sentience’; but this too doesn’t imply nihilism about our sentience-related moral judgments, because our moral judgments were always pointing at a vague empirical cluster rather than predicated upon a specific set of exact necessary and sufficient conditions.
Aside from wildly unlikely scenarios like, cats were actually random coin tosses all along.
??? I’m very confused by the notion that if cats turn out to be non-sentient, then the only explanation for why we initially thought they were sentient is that a large number of random coins must have spontaneously arranged themselves into a human-like shape. This seems obviously wrong to me.
Instead, if it turns out that cats are not sentient, the explanation for why we thought they were sentient is simple:
We don’t know what sentience consists in, so we’re forced to rely on crude heuristics like “the more similar something is to a human, the more likely it is to be sentient”. So people sometimes observe similarities between cat behavior and human behavior, and update their priors toward ‘this cat is sentient’.
(People also often do more sophisticated versions of this, based on explicit or implicit models about which human-ish behaviors are most likely to be causally connected to our subjective experience—e.g., self-awarenesss, skill at learning, skill at abstracting, creativity...)
As we learn more about sentience and about cats, we’re able to make improved judgments about whether they are in fact sentient. Rather than relying on crude behavioral similarities, for example, we might be able to look at a cat brain scans for particular patterns that correspond to sentience in human brain scans.
The initial error we made was based on the fact that cats are similar to humans in some respects, but not all (because they are distantly related to us, and because their brains evolved to solve problems that partly overlap with the problems humans face). We weren’t sure which (dis)similarities mattered, and we didn’t know all the (dis)similarities, so learning more caused us to update.
Different versions of this analysis can explain both philosophers’ and scientists’ failed attempts to figure out whether their cats are sentient, and pet owners’ failed attempts to understand what was happening in their pets’ heads. (Though the latter may rest on more naive and obviously-unreliable heuristics for inferring sentience.)
To deny that this kind of error is possible seems wild to me, like denying that it’s possible to be wrong about what’s going on in another human’s head. I can be wrong in thinking that a human is angry, even though I don’t know exactly what ‘anger’ is neurologically. And I can be wrong in thinking that a comatose human is sentient, even though I don’t know exactly what ‘sentience’ is neurologically.
I’m guessing you would reply with “yes, but I know that I have sentience and I have a justifiable prior that other people are similar to me”. Here, it feels suspiciously convenient for the parameters of the prior to turn out just right.
I don’t understand why that would be suspicious. Human brains are extremely similar; if a complex piece of machinery shows up in one of them, then it tends to show up in all or most of them. E.g., it’s rare to find an adult human brain that isn’t capable of language, or isn’t capable of laughter, or isn’t capable of counting to ten, or isn’t capable of remembering things that happened more than one hour ago. If there’s nothing suspicious about my prior ‘other adult humans will almost always be able to count to ten’, then I don’t see why one would be suspicious about my prior ‘other adult humans will almost always have subjective experiences’.
Seems odd to cite “pure coincidence” and “deliberate deception” here, when there are a lot of more common examples. E.g...
I think that these examples are less interesting because the subject’s interaction with these “pseudo-people” is one-sided: maybe the subject talks to them, but they don’t talk back or respond in any way. Or maybe the subject thinks that e.g. the bird singing in the tree is a message from some god, but that’s getting us pretty close to random coin tosses. Personhood is something that can be ascribed to system that has inputs and outputs. You can gather evidence of personhood by interacting with the system and observing the inputs and outputs. Or you can have some indirect evidence that somewhere there is a system with these properties, but these additional layers of indirection are just extra uncertainty without much philosophical interest. I’m guessing you would say that behavior is also merely indirect evidence of “sentience” but here the woods are murkier since I don’t know what “sentience” is even supposed to mean, if it’s not a property of behavior. Now, things are actually more complicated because there’s the issue of where exactly to draw the boundary around the system (e.g. is the output the person moving their hand, or is it person’s brain generating some neural signal that would move the hand, assuming the rest of the body functions properly), but it still feels like e.g. interacting with a cat gets you much closer to “direct” observation than e.g. hearing stories about a person that lives somewhere else and might or might not exist.
I don’t see how it’s circular...
Let’s taboo “sentient”. Look, I care about cats. You’re telling me “you shouldn’t care about cats, you should instead care about this property for which I don’t have anything resembling a definition, but we definitely can’t be sure that cats have it”. And my response is, why should I care about this property?? I don’t care about this property (or maybe I do? I’m not sure before you define what is). I do care about cats. It’s like you’re trying to convince a paperclip maximizer that it should care about staples instead: why would it listen to you?
To deny that this kind of error is possible seems wild to me, like denying that it’s possible to be wrong about what’s going on in another human’s head. I can be wrong in thinking that a human is angry, even though I don’t know exactly what ‘anger’ is neurologically.
The kind of evidence that can convince me that someone who I thought is angry is actually not angry is, seeing them behave in ways inconsistent with being angry and discovering new explanations for behaviors I previously attributed to anger (“explanations” in the mundane sense, e.g. “Alice didn’t call me because her battery ran out”, not [something about neurology]). If you instead told me that your new theory of the brain proves that every time someone appears angry they are actually calm and happy, I would be very skeptical.
I don’t understand why that would be suspicious. Human brains are extremely similar; if a complex piece of machinery shows up in one of them, then it tends to show up in all or most of them.
How do you know that your notion of “sentience” is a “piece of machinery” rather than e.g. some Rob-specific set of ranges of parameters of the machinery, s.t. Rob is the only person alive who has parameters within this range?
I think that these examples are less interesting because the subject’s interaction with these “pseudo-people” is one-sided
I don’t see why it should matter that they’re “less interesting”; they’re real examples, a theory should have an easy time managing reality. I come away with the impression that you’re too deep into a specific theory that you prize for its elegance, such that you’re more tempted to try to throw away large parts of everyday human intuition and value (insofar as they’re in tension with the theory) than to risk having to revise the theory.
In your previous comment you wrote: “Or (as seems more likely to me) there are some intuitions so strong that we should be suspicious of clever arguments attempting to refute them?”
But my view is the one that more closely tracks ordinary human intuitions, which indeed say that we care much more about (e.g.) whether the brain/mind is actually instantiating happiness, than about whether the agent’s external behaviors are happy-looking.
A pet owner whose brain scan revealed that the cat is suffering horribly would be distraught; going ‘oh, but the cat’s external behaviors still look very calm’ would provide zero comfort in that context, whereas evidence that the brain scan is incorrect would provide comfort. We care about the welfare of cats (and, by extension, about whether cats have ‘welfare’ at all) via caring about brain-states of the cat.
The reason we focus on external behaviors is because we don’t understand cat brains well enough, nor do we have frequent and reliable enough access to brain scans, to look at the thing that actually matters.
You can say that there’s somehow a deep philosophical problem with caring about brain states, or a deep problem with caring about them absent a full reduction of the brain states in question. But the one thing you can’t say is ‘this nonsense about “is the cat’s brain really truly happy or sad?” is just a clever argument trying to push us into a super counter-intuitive view’. Your view is the far more revisionist one, that requires tossing out far deeper and more strongly held folk intuitions.
Personhood is something that can be ascribed to system that has inputs and outputs.
You can gather evidence of personhood by interacting with the system and observing the inputs and outputs.
If “inputs” here just means ‘things that affect the person’, and “outputs” just means ‘thing the person affects’, then sure. But all physical objects have inputs and outputs in that sense. If you mean something narrower by “inputs” and “outputs” (e.g., something closer to ‘sensory information’ and ‘motor actions’), then you’ll need to explain why that narrower thing is essential for personhood.
I’m guessing you would say that behavior is also merely indirect evidence of “sentience” but here the woods are murkier since I don’t know what “sentience” is even supposed to mean, if it’s not a property of behavior.
It’s a property of brains. If we both don’t have a good reduction of “sentience”, then I don’t see why it’s better to say ‘it’s an unreduced, poorly-understood property of behavior’ than to say ‘it’s an unreduced, poorly-understood property of brains’.
Let’s taboo “sentient”. Look, I care about cats. You’re telling me “you shouldn’t care about cats, you should instead care about this property for which I don’t have anything resembling a definition, but we definitely can’t be sure that cats have it”. And my response is, why should I care about this property??
If someone’s a sociopath who doesn’t care about the welfare of cats, and just enjoys using cats as sources of sensory entertainment, then yeah, it makes sense to go ‘feel free to replace my cat with an unconscious automaton that’s equally entertaining’ or ‘feel free to alter my cat so that it’s constantly horribly suffering internally, as long as its outward behavior remains unchanged’.
But most people do care about the welfare of cats. For those people, it matters whether cats have welfare, and they intuitively understand welfare to be mostly or entirely about the cat’s mind/brain.
This intuitive understanding is correct and philosophically unproblematic. A concept isn’t problematic just because it hasn’t been fully reduced to a neuro or cog-sci model. It’s just an open area for future research.
...I come away with the impression that you’re too deep into a specific theory that you prize for its elegance, such that you’re more tempted to try to throw away large parts of everyday human intuition and value (insofar as they’re in tension with the theory) than to risk having to revise the theory.
In your previous comment you wrote: “Or (as seems more likely to me) there are some intuitions so strong that we should be suspicious of clever arguments attempting to refute them?”
But my view is the one that more closely tracks ordinary human intuitions, which indeed say that we care much more about (e.g.) whether the brain/mind is actually instantiating happiness, than about whether the agent’s external behaviors are happy-looking.
...But the one thing you can’t say is ‘this nonsense about “is the cat’s brain really truly happy or sad?” is just a clever argument trying to push us into a super counter-intuitive view’. Your view is the far more revisionist one, that requires tossing out far deeper and more strongly held folk intuitions.
Huh? My interpretation of this conversation is almost diametrically opposite! For me it felt like:
Rob: I don’t understand why people think they care about cats, they seem just irrational.
Vanessa: I have a very strong intuitive prior that I care about cats.
Rob: I am unsatisfied with this answer. Please analyze this intuition and come up with a model of what’s actually happening underneath.
Vanessa: Okay, okay, if you really want, here’s my theory of what’s happening underneath.
The thing is, I have much higher confidence in the fact that I care about cats than in the specific theory. And I think that the former a pretty ordinary intuition. Moreover, everything you say about cats can be said about humans as well (“we don’t understand the human brain very well etc”). I’m guessing you would say something about, how humans are similar to each other in some specific way in which they are not known to be similar to cats, but this is just passing the buck to, why should I care about this specific way?
The rest of your comment seems to be about the theory and not about the intuition. Now, I’m happy to discuss my theory of personhood, but I will refrain to do so atm because (i) I don’t want us to continue mixing together the claim “I care about cats” and the claim “this specific theory of personhood is correct”, which have very different epistemic status and (ii) I’m not even sure you’re interested in discussing the theory.
Let’s taboo “sentient”. Look, I care about cats. You’re telling me “you shouldn’t care about cats, you should instead care about this property for which I don’t have anything resembling a definition, but we definitely can’t be sure that cats have it”. And my response is, why should I care about this property??
If someone’s a sociopath who doesn’t care about the welfare of cats, and just enjoys using cats as sources of sensory entertainment, then yeah, it makes sense to go ‘feel free to replace my cat with an unconscious automaton that’s equally entertaining’ or ‘feel free to alter my cat so that it’s constantly horribly suffering internally, as long as its outward behavior remains unchanged’.
I… don’t think I’m actually a sociopath? Google defines “sociopath” as “a person with a personality disorder manifesting itself in extreme antisocial attitudes and behavior and a lack of conscience”, and I’m pretty sure I did not exhibit any extreme antisocial attitudes. I’m actually not claiming anything like “feel free to alter my cat so that it’s constantly horribly suffering internally, as long as its outward behavior remains unchanged”, although I’m not sure this is a coherent hypothetical (I can imagine something like, “clone my cat s.t. one copy continues to control the body while another copy is locked away in some simulation where it’s horribly suffering”, which I’m not okay with.)
We don’t know what the neural or cognitive correlates of ‘sentience’ are, but that doesn’t mean there is no such thing. And, sure, the process of learning what the correlates are may involve at least some revision to our concept of ‘sentience’; but this too doesn’t imply nihilism about our sentience-related moral judgments, because our moral judgments were always pointing at a vague empirical cluster rather than predicated upon a specific set of exact necessary and sufficient conditions.
“Empirical cluster” is a good way to look it[1]. The way I model this conversation so far is:
Rob’s point of view: X (sentience / personhood / whatever empathy is “trying” to detect) is an empirical cluster which obviously includes humans and doesn’t include rocks. A priori, we don’t know about cats: they are not in the “training set”, so to speak, requiring generalization. Vanessa is saying that cats, like humans, evoke empathy, therefore cats are in X. But, this is unsound! We don’t know that empathy is a sufficient condition! Cats and humans have important cognitive differences! Someday we’ll find a really good gears model that fits the data points we have (which include humans as a positive example and rocks as a negative example, but not cats) and only then we can decide whether cats are in X.
Vanessa’s point of view: X is an empirical cluster which obviously includes humans and cats, and doesn’t include rocks. Cats are totally inside the training set! Saying that “cats and humans have cognitive differences, therefore we need a gears model to decide whether X contains cats” makes as much sense as “women and men have cognitive differences, therefore we need a gears model to decide whether X contains [the other sex]”.
This doesn’t really explain where those different assumptions are coming from, though. For me, empathy is essentially the feeling that I care about something in the caring-about-people sense, so it’s almost tautologically the most direct evidence there is. Yes, finding out more facts can change how much empathy I feel towards something, but current level of empathy is still the obvious baseline for how much empathy I’ll feel in the future.
On the other hand, Rob… I’m guessing that Rob is trying to get something which looks more like “objective morality” (even if not fully subscribing to moral objectivism) and therefore appealing to some kind of cognitive science seems overwhelmingly better to him than trusting emotions, even when we barely understand the relevant cognitive science? But, I’m not sure.
Although, like I said, I’m not talking about moral judgement here (which I see as referring to social norms / reputation systems or attempts to influence social norms / reputation systems), just about individual preferences.
Vanessa is saying that cats, like humans, evoke empathy, therefore cats are in X. But, this is unsound! We don’t know that empathy is a sufficient condition! Cats and humans have important cognitive differences! Someday we’ll find a really good gears model that fits the data points we have (which include humans as a positive example and rocks as a negative example, but not cats) and only then we can decide whether cats are in X.
Another way of seeing why this view is correct is to note that empathy can be evoked by fictional characters, by entities in dreams, etc. If I read a book or view a painting that makes me empathize with the fictional character, this does not make the fictional character sentient.
(It might be evidence that if the fictional character were real, it would be sentient. But that’s not sufficient for a strong ‘reduce everything to empathy’ view. Once you allow that empathy routinely misfires in this way—indeed, that empathy can be misfiring even while the empathizing person realizes this and is not inclined to treat the fictional character as a true moral patient in reality—you lose a lot of the original reason to think ‘it’s all about empathy’ in the first place.)
I’m guessing that Rob is trying to get something which looks more like “objective morality” (even if not fully subscribing to moral objectivism) and therefore appealing to some kind of cognitive science seems overwhelmingly better to him than trusting emotions, even when we barely understand the relevant cognitive science? But, I’m not sure.
I’m saying that insofar as feelings like ‘I should treat my cat well’ assume things about the world, they’re assuming things like “cats exist”, “cats have minds”, “cats’ minds can be in particular states that are relevantly similar to positively and negatively valenced experience in my own mind”, “the cat’s mind is affected by sensory information it acquires from the environment”, “my actions can affect which sensory information the cat acquires”...
The concept “mind” (insofar as it’s contentful and refers to anything at all) refers to various states or processes of brains. So there’s a straight line from ‘caring about cats’ welfare’ to ‘caring about cats’ minds’ to ‘caring about which states the cat’s brain is in’. If you already get off the train somewhere on that straight line, then I’m not sure why.
Anger is a state of mind, and therefore (in some sense) a state of brains. It would be a mistake to say ‘anger is just a matter of angry-seeming behaviors; it’s the behaviors that matter, not the brain state’. The behaviors are typically usefulevidence about the brain state, but it’s still the brain state that we’re primarily discussing, and that we primarily care about.
(At least, ‘is this person’s brain actually angry?’ is the thing we mostly care about if it’s a friend we’re thinking about, or if we’re thinking about someone whose welfare and happiness matters to us. If we’re instead worried about someone physically attacking us, then sure, ‘are they going to exhibit angry-seeming behaviors?’ matters more in the moment then ‘are they really and truly angry in their heart of hearts?’.)
I expect some conceptual revision to be required to find the closest neural/cognitive correlate of ‘sentience’. But the same is plausibly true for ‘anger’, partly because anger is itself a thing that people typically think of as a sentient/conscious state!
One crude way of thinking about ‘sentience’ is that it’s just the disjunction of all the specific conscious states: anger, experiencing the color red, experiencing a sour taste, suffering, boredom...
Just as we can be uncertain about whether someone’s brain is ‘really’ angry, we can be uncertain about whether it’s experiencing any of the conscious states on the long list of candidates.
It would be obviously silly to say ‘we know with certainty that cats truly instantiate human-style anger in their brains, since, after all, my cat sometimes makes loud vocalizations and hisses at things’.
It would be even sillier to say ’whether cats are angry purely consists in whether they exhibit loud vocalizations, hiss at things, etc.; there’s no further important question about how their brains work, even though brain state obviously matters in the case of humans, when we ascribe “anger” to a human!
It isn’t any less silly to do those things in the case of the more general and abstract category, than to do it in the case of the concrete instance like ‘anger’.
Another way of seeing why this view is correct is to note that empathy can be evoked by fictional characters, by entities in dreams, etc. If I read a book or view a painting that makes me empathize with the fictional character, this does not make the fictional character sentient.
(It might be evidence that if the fictional character were real, it would be sentient. But that’s not sufficient for a strong ‘reduce everything to empathy’ view. Once you allow that empathy routinely misfires in this way—indeed, that empathy can be misfiring even while the empathizing person realizes this and is not inclined to treat the fictional character as a true moral patient in reality—you lose a lot of the original reason to think ‘it’s all about empathy’ in the first place.)
Good point! I agree that “I feel empathy towards X” is only sufficient to strongly[1] motivate me to help X is I also believe that X is “real”. But, I also believe that my interactions with cats are strong evidence that cats are “real”, despite my ignorance about the inner workings of cat brains. This is exactly the same as, my interactions with humans are strong evidence that humans are “real”, despite my ignorance about human brains. And, people justifiably knew that other people are “real” even before it was discovered that the brain is responsible for cognition.
The concept “mind” (insofar as it’s contentful and refers to anything at all) refers to various states or processes of brains. So there’s a straight line from ‘caring about cats’ welfare’ to ‘caring about cats’ minds’ to ‘caring about which states the cat’s brain is in’. If you already get off the train somewhere on that straight line, then I’m not sure why.
I agree that there’s a straight line[2]. But, the reason we know brains are relevant, is by observing that brain states are correlated with behavior. If instead of discovering that cognition runs on brains, we would discover it runs on transistor circuits, or computed somehow inside the liver, we would care about those transistor circuits / livers instead. So, your objection that “we don’t know enough about cat brains” is weak, since I do know that cat-brains produce cat-behavior, and given that correlation-with-behavior is the only reason we’re looking at brains in the first place, this knowledge counts for a lot, even if it’s far from a perfect picture of how cat brains work. I also don’t know have a perfect picture of how human brains work, but I know enough (from observing behavior!) to conclude that I care about humans.
I actually do feel some preference for fictional stories in which too-horrible things happen not to exist, even if I’m not consuming those stories, but that’s probably tangential.
I’m not sure I agree with “the concept of mind refers to various states or processes of brains”. We know that, for animals, there is a correspondence between minds and brains. But e.g. an AI can have a mind without having a brain. I guess you’re talking “brains” which are not necessarily biological? But then are “mind” and “brain” just synonyms? Or “brain” refers to some kind of strong reductionism? But, I can also imagine a different universe in which minds are ontologically fundamental ingredients of physics.
But you can still use behaviour/empathy to determine low cutoff of mind-similarity when you translate your utility function from native ontology to real mind-states. Caring about everything, that made you sad before doesn’t sound horrible, like not caring about anything that didn’t make you sad.
Not sure about Rob’s view, but I think a lot of people start out from this question from a quasi-dualistic perspective: some entities have “internal experiences”, “what-it’s-like-to-be-them”, basically some sort of invisible canvas on which internal experiences, including pleasure and pain, are projected. Then later, it comes to seem that basically everything is physical. So then they reason like “well, everything else in reality has eventually been reduced to physical things, so I’m not sure how, but eventually we will find a way to reduce the invisible canvases as well”. Then in principle, once we know how that reduction works, it could turn out that humans do have something corresponding to an invisible canvas but cats don’t.
As you might guess, I think this view of consciousness is somewhat confused, but it’s a sensible enough starting point in the absence of a reductionist theory of consciousness. I think the actual reduction looks more like an unbundling of the various functions that the ‘invisible canvas’ served in our previous models. So it seems likely that cats have states they find aversive, that they try to avoid, they take in sensory input to build a local model of the world, perhaps a global neuronal workspace, etc., all of which inclines me to have a certain amount of sympathy with them. What they probably don’t have is the meta-learned machinery which would make them think there is a hard problem of consciousness, but this doesn’t intuitively feel like it should make me care about them less.
I’m an eliminativist about phenomenal consciousness. :) So I’m pretty far from the dualist perspective, as these things go...!
But discovering that there are no souls doesn’t cause me to stop caring about human welfare. In the same way, discovering that there is no phenomenal consciousness doesn’t cause me to stop caring about human welfare.
Nor does it cause me to decide that ‘human welfare’ is purely a matter of ‘whether the human is smiling, whether they say they’re happy, etc.‘. If someone trapped a suffering human brain inside a robot or flesh suit that perpetually smiles, and I learned of this fact, I wouldn’t go ‘Oh, well the part I care about is the external behavior, not the brain state’. I’d go ‘holy shit no’ and try to find a way to alleviate the brain’s suffering and give it a better way to communicate.
Smiling, saying you’re happy, etc. matter to me almost entirely because I believe they correlate with particular brain states (e.g., the closest neural correlate for the folk concept of ‘happiness’). I don’t need a full reduction of ‘happiness’ in order to know that it has something to do with the state of brains. Ditto ‘sentience’, to the extent there’s a nearest-recoverable-concept corresponding to the folk notion.
What information about cat brains can I possibly learn to make me classify them as “non-persons”?
Do you value conscious experience in yourself more than unconscious perception with roughly the same resulting external behavior? Then it is conceivable that empathy is mistaken about what kind of system is receiving inputs in cat’s case and there is at least difference in value depending on internal organization of cat’s brain.
I’m struggling to think of a good example for this? Usually conscious experience causes at least one difference in external behavior, namely that I might tell you about it if you ask me. Cats can’t talk, which does affect my attitude towards cats, but I don’t think my empathy somehow fails to take it into account?
But you don’t value conscious experience because you told me, right? Or you don’t value it proportionally to external behavior. Then that’s another intuition about personhood that you will need to include, so you’ll interpolate from “conscious parts of me—person”, “unconscious parts of me—non-person”, “rock—non-person”, and may decide that cats are more like unconscious parts of you.
I object to the classification “conscious parts of me—person”, “unconscious parts of me—non-person”. I think that personhood is more like a collective property of the whole than residing in just the “conscious parts”. And, I don’t think my caring-about-myself is pointing towards only the “conscious parts”. I agree that cats might lack a part that humans have which has something to do with consciousness (with the important caveat that “consciousness” is an ill-defined term that probably refers to different things in different contexts), and this probably reduces the amount I care about them, but it still leaves a lot of me-caring-about-them.
So like “humans − 1.5”, “cats − 1.0“, “rocks − 0.0” instead of “1.0, 0.0, 0.0”? Ok then, sounds consistent. Someone might object that we call caring about non-conscious stuff “aesthetic preferences”, but I don’t see how caring about cat’s inner life usually expressed by behaviour is different.
From my perspective, ‘sentience is a wrong concept’ and ‘sentience isn’t the central thing we morally care about’ isn’t a crux. If I’m confused somehow about sentience, I still expect something similarly complicated about brain algorithms to be where nearly all the value lies, and I still expect ‘does looking at this organism’s external behaviors naively make me feel bad, in the absence of any deep neuroscience or psychology knowledge?’ to be an extraordinarily poor guide to the morally impatient aspects of the relevant brains.
One in a hundred likely won’t be enough if the organization doing the boxing is sufficiently security conscious. (And if not, there will likely be other issues.)
The problem, of course, is that an AI box may only have to fail once, just like it may take only one person out of Wuhan.
To some degree, yes. (Like, a once-off exploit that works on one in every billion humans presumably doesn’t matter, whereas an exploit that works on one in every hundred programmers does.)
In any case, I just saw on Twitter:
Obviously this is ambiguous.
Also, in case it’s not obvious:
I don’t think it’s silly or crazy to wonder whether GPT-3 or LaMDA are sentient / have subjective experiences, and I reject the “but that sounds weird” counter-argument in the strongest possible terms.
I would wager it’s not sentient, but there’s nothing like a consensus re how sentience works in humans, much less how it works in algorithms-in-general. It’s a serious open question IMO, and by default is likely to become an increasingly serious question as AI exhibits more human-like or otherwise impressive cognitive abilities, if only via the “who the heck knows how this works??” path.
Lemoine’s reasoning about this question is terrible (“Essentially all of my claims about sentience, personhood and rights are rooted in my religious convictions as a priest”), his interview is terrible, and I strongly expect many other technical people to reason extremely poorly about this question. Completely unserious, anthropomorphizing, sloppy, and just plain unimaginative.
If we create sentient AI, then obviously we should strongly default toward assuming they’re moral patients who should be treated well.
Creating sentient AI without thinking through the implications in advance is a terrible idea, and should be avoided.
Hm. This updates me toward thinking I should be louder in pointing out that we have very little idea which non-human nervous-system-bearing organisms are or aren’t sentient. (‘We’ being ‘at least the subset of humanity that does not claim to have a powerful gearsy predictive model of sentience’.)
The idea that you can reach 90+% confidence that a non-human animal is sentient, via evidence like ‘I heard its vocalizations and looked into its eyes and I just knew’, is objectively way, way, way, way, way, way crazier than Lemoine thinking he can reach 90+% confidence that LaMDA is sentient via his conversation.
(It’s true that non-human animals are related to humans, which is at least weak reason to have a higher prior that there might be sentient non-human animals today than that there might be sentient AI systems today. But that alone can’t make for a drastically higher prior, if we don’t even know what ‘sentience’ is; just knowing that humans possess a psychological trait should not update us much about whether lobsters have the same trait, before you know what the trait is.)
One reason it might be good to push back more in the animal case is that anthropomorphism, magical thinking, and overconfidence in the animal case might make clear thinking harder in the AI case: once you buy an intuition like ‘my empathy is a good guide to which species are sentient’ or a view like ‘everything is definitely sentient yolo ¯\_(ツ)_/¯’, you’re handicapping your ability to think clearly about minds in general, not just about animals.
I don’t agree with that. The animal shares an evolutionary history with us whereas a language model works in an alien way, and in particular, it wasn’t trained to have a self-model.
Edit: Nevermind, my reply mentions arguments other than “I looked into its eyes,” so probably your point is that if we forget everything else we know about animals, the “looking into the eyes” part is crazy. I agree with that.
Yeah, there might be other information that combines with ‘I looked into its eyes’ to yield high confidence in the animal case and not in the AI case.
I would also add, though, that ‘I share an evolutionary history with other organisms’ isn’t a strong enough consideration on its own to get to 90+%.
‘It wasn’t trained to have a self-model’ might be the kind of thing that can justifiably inspire extreme confidence, depending on why you think that’s important / what your model of sentience is, and how you know that model’s true.
I also disagree strongly with that paragraph, at least as it applies to higher mammals subject to consistent, objective and lengthy study. If I read it to include that context ( and perhaps I’m mistaken to do so), it appears to be dismissive (trolling even) of the conclusions of, at the very least, respected animal behaviour researchers such as Lorenz, Goodall and Fossey.
Instead of appealing to “empathy with an animal“ as a good guide, I would rather discuss body language. “Body language“ is called such for good reason. Before homo sapiens (or possibly precursor species) developed verbal communication, body language had evolved as a sophisticated communication mechanism. Even today between humans it remains a very important, if under-recognised, mode of communication (I recall attending a training course on giving presentations. It was claimed body language accounted for about 50% of the impact of the presentation, the facts presented on the slides only 15%). Body language is clearly identifiable in higher mammals. Even if it is not identical to ours in all, or even many, respects, our close evolutionary connection with higher mammals allows us, in my view, to be able to confidently translate their body language into a consistent picture of their mental state, actually pretty easily, without too much training. We have very similar ‘hardware’ to other higher mammals (including,- and this is important, in regard to regulating the strength and nature of mammalian emotional states- an endocrine system)) and this is key, at least in regard to correctly identifying equivalent mental states. Reading of body language seems to me to just as valid an informational exchange, as a verbal Turing Test carried out over a terminal, and our shared genetic heritage does allow a certain amount of anthropomorphic comparison that is not woo, if done with objectivity, IMO.
Equivalence of mental/ emotional states with ours, doesn’t necessarily lead to a strong inference that higher mammals are sentient, though it is probably good supporting evidence.
I would chose dogs rather than cats as, unlike Vanessa Kosoy, apparently, (see elsewhere in these threads) I’m a dog person. Domestic dogs are a bit of a special case because they have co-evolved with humans for 30,000-40,000 years. Dogs that were most able to make their needs plain to humans, likely prospered. This would, I think, naturally lead to an even greater convergence of the way the same human and dog mental state is displayed, for some important states-necessary-to-be-communicated-to-humans-for-dog-benefit, because that would naturally gives rise to the most error-free cross-species communication.
The mental states I would have no hesitancy in saying are experienced by myself and a domestic dog in a recognisably similar way (to >90% certainty) are fear, joy, pain, fight or flight response, jealousy/insecurity, impatience and contentment.
I’d be less certain, but certainly not dismissive, of anger, love, companionship ( at least as we understand it), and empathy. I also don’t have a very strong confidence they have a sense of self, though that is not necessary for my preferred model of sentience.
I have never seen my dog display anything I interpret as disgust, superiority, amusement or guilt.
But similarity of emotions and interpretation of body language are not the only signs I interpret as possibly indicating sentience. I also observe that a dog (mostly n=1) is capable of e.g.
Self initiated behaviour to improve its own state.
Clear and quite nuanced communication of needs ( despite limited ‘speech’)
Attention engagement to request a need be met ( a paw on the ankle, a bark of a particular tone and duration)
Deduction, at a distance, of likely behaviour of other individuals (mostly other dogs) and choosing a corresponding response
Avoidance of aggressive dogs. (Via cues not always obvious to myself)
Meet and smell with dogs of similar status
Recognition and high tolerance of puppies ( less so with adolescents)
Domineering behaviour against socially weak dogs.
On the basis of an accumulation of such observations (the significance of each of which may be well short of 90%) the model I have of a typical dog is that it has (to >99% likleyhood) some level of sentience, at least according to my model of sentience.
I have actually had a close encounter with a giant cuttlefish “where I looked into its eyes and thought I detected sentience” but here I‘m more aligned with Rob (to 90% confidence), and that this was a case of over-anthropomorphism—the genetic gap is probably too large (and it was a single short observation).
I would incidentally put a much lower probability than 10% that any statement of LaMDA that claims ownership of a human emotion, and claims it manifests just like that human emotion, means anything significant at all.
I want to push back against the last paragraph. I think my empathy is an excellent guide to “the inputs to which systems do I care about”, because empathy essentially is the feeling that “I’m sad that this system received such input” or “I’m happy that this system received such input”. The utility function is not up for grabs. On the other hand, the question of which systems are sentient is obviously going to depend on what do you mean by “sentient”. Here we should start by asking, why do we even care about this in the first place, lest we end up in a meaningless argument over definitions.
Sorry, to clarify, I’m not saying ‘we should discard the part of human values that cares about other minds’. I’m saying that absent a gearsy model of what’s going on inside animal brains, how sentience works (or how other morally relevant properties work), etc. the empathic response to external behaviors and how cute their face looks is an incredibly weak guide to ‘what our reflectively endorsed morality/kindness/empathy/etc. would say about this organism if we actually understood this stuff’.
An assumption I’m making here (and strongly endorse) is that humanity’s aesthetic preferences regarding external behaviors are massively less reflectively important to us than our moral concern for internal subjective experiences.
E.g., compare the cases:
‘an organism that behaves externally in everyday life as though it’s happy, but internally is in a constant state of intense suffering’
‘an organism that behaves externally in everyday life as though it’s suffering, but internally is in a constant state of bliss’
I claim that humans prefer option 2, and indeed that this is one of the easiest questions you can ask a philosophically inclined human. The external appearance doesn’t have zero importance, but its relative importance is completely negligible in this case.
The thing we actually care about is (some complicated set of things about the internal state / brain algorithm), and naive surface impressions are an extremely poor indicator for that if you’re looking at ‘all organisms with nervous systems’, as opposed to ‘all humans’.
The way it works, IMO, is: we assign interpretations to some systems we see around us that describe those systems as “persons”. Hence, a system that admits such an interpretation has “empathy-value”[1] whereas a system that admits no such interpretation has no empathy-value.
Now, there are situations where different interpretations conflict. For example, I thought Alice has certain thoughts and emotions, but it turned out that it was an intentional, conscious, pretense, and Alice actually had rather different thoughts and emotions. In this case, the new interpretation (which accounts for more facts about Alice) overrides the old interpretation[2]. Something of this sort can apply to your example as well.
In the previous example, receiving new information caused us to change our interpretation from “person A” to “person B”. Is it possible to receive new information that will change the interpretation from “person” to “no person”? One example of this is when the appearance of personhood turns out to be a coincidence. A coin was tossed many times and the outcomes accidentally formed a person-shaped pattern. But, the probability of this usually goes down exponentially as more data is acquired[3]. Another potential example is a paperclip maximizer pretending to be a person. But, if this requires the paperclip maximizer to effectively simulate a person, our empathy is not misplaced after all.
What information about cat brains can I possibly learn to make me classify them as “non-persons”? Saying “discovering that they are non-sentient” is completely circular. I’m not sure any such information exists[4]. Moreover, what about other humans? We don’t have a great model of what’s going on in human brains either. I’m guessing you would reply with “yes, but I know that I have sentience and I have a justifiable prior that other people are similar to me”. Here, it feels suspiciously convenient for the parameters of the prior to turn out just right.
What about all the people who never think of philosophy and just naively follow their empathy towards other people? Did they just luck out to have correct opinions about their own values that could just as easily turn out to be completely wrong? Or (as seems more likely to me) there are some intuitions so strong that we should be suspicious of clever arguments attempting to refute them?
I’m avoiding the word “moral” on purpose, since IMO morality is about something else altogether, namely about social reputation systems (even though it’s pretending to be about objective truths).
An alternative model is, in this situations there are two different people corresponding to the two interpretations. One person is Alice-the-actor and another person is Alice-the-character. In practice, we would usually forget about Alice-the-character (even though it causes us grief), because (i) her existence is entirely contingent on Alice-the-actor’s cooperation and (ii) she is designed to manipulate us in Alice-the-actor’s favor; and hence staying attached is a bad idea.
I suspect that something like this would happen to most people who interact with LaMDA for enough time: an initial impression of personhood fading in the face of constant non sequiturs and contradictions.
Aside from wildly unlikely scenarios like, cats were actually random coin tosses all along.
Seems odd to cite “pure coincidence” and “deliberate deception” here, when there are a lot of more common examples. E.g.:
Someone believes in a god, spirit, ghost, etc. They learn more, and realize that they were wrong, and no such person exists.
I see a coat hanging in a dark room, and momentarily think it’s a person, before realizing that it’s not.
Someone I know gets into a horrible accident. I visit them in the hospital and speak to them, hoping they can hear me. Later, a doctor comes in and informs me that they’ve been brain-dead for the last hour.
I’m watching a video of someone and realize partway through it’s computer-generated.
None of these are “pure coincidences” at the level of “a coin was tossed many times and the outcomes accidentally formed a person-shaped pattern”. Mistakenly ascribing personhood is a very common, everyday occurrence.
I don’t see how it’s circular. But regardless: being a “person” or being “sentient” consists in some sorts of algorithmic states, and not others. E.g., a rock is not a person; a normally function human is a person; and when I learn that a human is brain-dead, I’m learning things about their algorithm that dramatically increase the probability that they’re not a person. (Likewise if someone removed their brain and replaced it with a rock.)
The case of a braindead person, or even more so a person whose brain has been replaced by a rock, is easy because it removes so many algorithmic details that we can be very confident that the person-y / sentient-ish ones are gone. This lets us make judgments about personhood/sentience/etc. without needing a full reduction or an explanation of which specific processes are essential.
The case of a cat is harder, and requires us to learn more about what the neural or cognitive correlates of personhood/sentience are, and about what neural or cognitive states cats instantiate. But we can in fact learn such things, and learning such things will in fact cause us (correctly) to concentrate our probability mass about how to treat cats, much as learning whether a human is brain-dead concentrates our probability mass about how to treat that human.
A blank map doesn’t correspond to a blank territory. We don’t know what the neural or cognitive correlates of ‘sentience’ are, but that doesn’t mean there is no such thing. And, sure, the process of learning what the correlates are may involve at least some revision to our concept of ‘sentience’; but this too doesn’t imply nihilism about our sentience-related moral judgments, because our moral judgments were always pointing at a vague empirical cluster rather than predicated upon a specific set of exact necessary and sufficient conditions.
??? I’m very confused by the notion that if cats turn out to be non-sentient, then the only explanation for why we initially thought they were sentient is that a large number of random coins must have spontaneously arranged themselves into a human-like shape. This seems obviously wrong to me.
Instead, if it turns out that cats are not sentient, the explanation for why we thought they were sentient is simple:
We don’t know what sentience consists in, so we’re forced to rely on crude heuristics like “the more similar something is to a human, the more likely it is to be sentient”. So people sometimes observe similarities between cat behavior and human behavior, and update their priors toward ‘this cat is sentient’.
(People also often do more sophisticated versions of this, based on explicit or implicit models about which human-ish behaviors are most likely to be causally connected to our subjective experience—e.g., self-awarenesss, skill at learning, skill at abstracting, creativity...)
As we learn more about sentience and about cats, we’re able to make improved judgments about whether they are in fact sentient. Rather than relying on crude behavioral similarities, for example, we might be able to look at a cat brain scans for particular patterns that correspond to sentience in human brain scans.
The initial error we made was based on the fact that cats are similar to humans in some respects, but not all (because they are distantly related to us, and because their brains evolved to solve problems that partly overlap with the problems humans face). We weren’t sure which (dis)similarities mattered, and we didn’t know all the (dis)similarities, so learning more caused us to update.
Different versions of this analysis can explain both philosophers’ and scientists’ failed attempts to figure out whether their cats are sentient, and pet owners’ failed attempts to understand what was happening in their pets’ heads. (Though the latter may rest on more naive and obviously-unreliable heuristics for inferring sentience.)
To deny that this kind of error is possible seems wild to me, like denying that it’s possible to be wrong about what’s going on in another human’s head. I can be wrong in thinking that a human is angry, even though I don’t know exactly what ‘anger’ is neurologically. And I can be wrong in thinking that a comatose human is sentient, even though I don’t know exactly what ‘sentience’ is neurologically.
I don’t understand why that would be suspicious. Human brains are extremely similar; if a complex piece of machinery shows up in one of them, then it tends to show up in all or most of them. E.g., it’s rare to find an adult human brain that isn’t capable of language, or isn’t capable of laughter, or isn’t capable of counting to ten, or isn’t capable of remembering things that happened more than one hour ago. If there’s nothing suspicious about my prior ‘other adult humans will almost always be able to count to ten’, then I don’t see why one would be suspicious about my prior ‘other adult humans will almost always have subjective experiences’.
I think that these examples are less interesting because the subject’s interaction with these “pseudo-people” is one-sided: maybe the subject talks to them, but they don’t talk back or respond in any way. Or maybe the subject thinks that e.g. the bird singing in the tree is a message from some god, but that’s getting us pretty close to random coin tosses. Personhood is something that can be ascribed to system that has inputs and outputs. You can gather evidence of personhood by interacting with the system and observing the inputs and outputs. Or you can have some indirect evidence that somewhere there is a system with these properties, but these additional layers of indirection are just extra uncertainty without much philosophical interest. I’m guessing you would say that behavior is also merely indirect evidence of “sentience” but here the woods are murkier since I don’t know what “sentience” is even supposed to mean, if it’s not a property of behavior. Now, things are actually more complicated because there’s the issue of where exactly to draw the boundary around the system (e.g. is the output the person moving their hand, or is it person’s brain generating some neural signal that would move the hand, assuming the rest of the body functions properly), but it still feels like e.g. interacting with a cat gets you much closer to “direct” observation than e.g. hearing stories about a person that lives somewhere else and might or might not exist.
Let’s taboo “sentient”. Look, I care about cats. You’re telling me “you shouldn’t care about cats, you should instead care about this property for which I don’t have anything resembling a definition, but we definitely can’t be sure that cats have it”. And my response is, why should I care about this property?? I don’t care about this property (or maybe I do? I’m not sure before you define what is). I do care about cats. It’s like you’re trying to convince a paperclip maximizer that it should care about staples instead: why would it listen to you?
The kind of evidence that can convince me that someone who I thought is angry is actually not angry is, seeing them behave in ways inconsistent with being angry and discovering new explanations for behaviors I previously attributed to anger (“explanations” in the mundane sense, e.g. “Alice didn’t call me because her battery ran out”, not [something about neurology]). If you instead told me that your new theory of the brain proves that every time someone appears angry they are actually calm and happy, I would be very skeptical.
How do you know that your notion of “sentience” is a “piece of machinery” rather than e.g. some Rob-specific set of ranges of parameters of the machinery, s.t. Rob is the only person alive who has parameters within this range?
I don’t see why it should matter that they’re “less interesting”; they’re real examples, a theory should have an easy time managing reality. I come away with the impression that you’re too deep into a specific theory that you prize for its elegance, such that you’re more tempted to try to throw away large parts of everyday human intuition and value (insofar as they’re in tension with the theory) than to risk having to revise the theory.
In your previous comment you wrote: “Or (as seems more likely to me) there are some intuitions so strong that we should be suspicious of clever arguments attempting to refute them?”
But my view is the one that more closely tracks ordinary human intuitions, which indeed say that we care much more about (e.g.) whether the brain/mind is actually instantiating happiness, than about whether the agent’s external behaviors are happy-looking.
A pet owner whose brain scan revealed that the cat is suffering horribly would be distraught; going ‘oh, but the cat’s external behaviors still look very calm’ would provide zero comfort in that context, whereas evidence that the brain scan is incorrect would provide comfort. We care about the welfare of cats (and, by extension, about whether cats have ‘welfare’ at all) via caring about brain-states of the cat.
The reason we focus on external behaviors is because we don’t understand cat brains well enough, nor do we have frequent and reliable enough access to brain scans, to look at the thing that actually matters.
You can say that there’s somehow a deep philosophical problem with caring about brain states, or a deep problem with caring about them absent a full reduction of the brain states in question. But the one thing you can’t say is ‘this nonsense about “is the cat’s brain really truly happy or sad?” is just a clever argument trying to push us into a super counter-intuitive view’. Your view is the far more revisionist one, that requires tossing out far deeper and more strongly held folk intuitions.
What are the “outputs” of a person experiencing locked-in syndrome?
If “inputs” here just means ‘things that affect the person’, and “outputs” just means ‘thing the person affects’, then sure. But all physical objects have inputs and outputs in that sense. If you mean something narrower by “inputs” and “outputs” (e.g., something closer to ‘sensory information’ and ‘motor actions’), then you’ll need to explain why that narrower thing is essential for personhood.
It’s a property of brains. If we both don’t have a good reduction of “sentience”, then I don’t see why it’s better to say ‘it’s an unreduced, poorly-understood property of behavior’ than to say ‘it’s an unreduced, poorly-understood property of brains’.
If someone’s a sociopath who doesn’t care about the welfare of cats, and just enjoys using cats as sources of sensory entertainment, then yeah, it makes sense to go ‘feel free to replace my cat with an unconscious automaton that’s equally entertaining’ or ‘feel free to alter my cat so that it’s constantly horribly suffering internally, as long as its outward behavior remains unchanged’.
But most people do care about the welfare of cats. For those people, it matters whether cats have welfare, and they intuitively understand welfare to be mostly or entirely about the cat’s mind/brain.
This intuitive understanding is correct and philosophically unproblematic. A concept isn’t problematic just because it hasn’t been fully reduced to a neuro or cog-sci model. It’s just an open area for future research.
Huh? My interpretation of this conversation is almost diametrically opposite! For me it felt like:
Rob: I don’t understand why people think they care about cats, they seem just irrational.
Vanessa: I have a very strong intuitive prior that I care about cats.
Rob: I am unsatisfied with this answer. Please analyze this intuition and come up with a model of what’s actually happening underneath.
Vanessa: Okay, okay, if you really want, here’s my theory of what’s happening underneath.
The thing is, I have much higher confidence in the fact that I care about cats than in the specific theory. And I think that the former a pretty ordinary intuition. Moreover, everything you say about cats can be said about humans as well (“we don’t understand the human brain very well etc”). I’m guessing you would say something about, how humans are similar to each other in some specific way in which they are not known to be similar to cats, but this is just passing the buck to, why should I care about this specific way?
The rest of your comment seems to be about the theory and not about the intuition. Now, I’m happy to discuss my theory of personhood, but I will refrain to do so atm because (i) I don’t want us to continue mixing together the claim “I care about cats” and the claim “this specific theory of personhood is correct”, which have very different epistemic status and (ii) I’m not even sure you’re interested in discussing the theory.
I… don’t think I’m actually a sociopath? Google defines “sociopath” as “a person with a personality disorder manifesting itself in extreme antisocial attitudes and behavior and a lack of conscience”, and I’m pretty sure I did not exhibit any extreme antisocial attitudes. I’m actually not claiming anything like “feel free to alter my cat so that it’s constantly horribly suffering internally, as long as its outward behavior remains unchanged”, although I’m not sure this is a coherent hypothetical (I can imagine something like, “clone my cat s.t. one copy continues to control the body while another copy is locked away in some simulation where it’s horribly suffering”, which I’m not okay with.)
“Empirical cluster” is a good way to look it[1]. The way I model this conversation so far is:
Rob’s point of view: X (sentience / personhood / whatever empathy is “trying” to detect) is an empirical cluster which obviously includes humans and doesn’t include rocks. A priori, we don’t know about cats: they are not in the “training set”, so to speak, requiring generalization. Vanessa is saying that cats, like humans, evoke empathy, therefore cats are in X. But, this is unsound! We don’t know that empathy is a sufficient condition! Cats and humans have important cognitive differences! Someday we’ll find a really good gears model that fits the data points we have (which include humans as a positive example and rocks as a negative example, but not cats) and only then we can decide whether cats are in X.
Vanessa’s point of view: X is an empirical cluster which obviously includes humans and cats, and doesn’t include rocks. Cats are totally inside the training set! Saying that “cats and humans have cognitive differences, therefore we need a gears model to decide whether X contains cats” makes as much sense as “women and men have cognitive differences, therefore we need a gears model to decide whether X contains [the other sex]”.
This doesn’t really explain where those different assumptions are coming from, though. For me, empathy is essentially the feeling that I care about something in the caring-about-people sense, so it’s almost tautologically the most direct evidence there is. Yes, finding out more facts can change how much empathy I feel towards something, but current level of empathy is still the obvious baseline for how much empathy I’ll feel in the future.
On the other hand, Rob… I’m guessing that Rob is trying to get something which looks more like “objective morality” (even if not fully subscribing to moral objectivism) and therefore appealing to some kind of cognitive science seems overwhelmingly better to him than trusting emotions, even when we barely understand the relevant cognitive science? But, I’m not sure.
Although, like I said, I’m not talking about moral judgement here (which I see as referring to social norms / reputation systems or attempts to influence social norms / reputation systems), just about individual preferences.
Another way of seeing why this view is correct is to note that empathy can be evoked by fictional characters, by entities in dreams, etc. If I read a book or view a painting that makes me empathize with the fictional character, this does not make the fictional character sentient.
(It might be evidence that if the fictional character were real, it would be sentient. But that’s not sufficient for a strong ‘reduce everything to empathy’ view. Once you allow that empathy routinely misfires in this way—indeed, that empathy can be misfiring even while the empathizing person realizes this and is not inclined to treat the fictional character as a true moral patient in reality—you lose a lot of the original reason to think ‘it’s all about empathy’ in the first place.)
I’m saying that insofar as feelings like ‘I should treat my cat well’ assume things about the world, they’re assuming things like “cats exist”, “cats have minds”, “cats’ minds can be in particular states that are relevantly similar to positively and negatively valenced experience in my own mind”, “the cat’s mind is affected by sensory information it acquires from the environment”, “my actions can affect which sensory information the cat acquires”...
The concept “mind” (insofar as it’s contentful and refers to anything at all) refers to various states or processes of brains. So there’s a straight line from ‘caring about cats’ welfare’ to ‘caring about cats’ minds’ to ‘caring about which states the cat’s brain is in’. If you already get off the train somewhere on that straight line, then I’m not sure why.
Anger is a state of mind, and therefore (in some sense) a state of brains. It would be a mistake to say ‘anger is just a matter of angry-seeming behaviors; it’s the behaviors that matter, not the brain state’. The behaviors are typically useful evidence about the brain state, but it’s still the brain state that we’re primarily discussing, and that we primarily care about.
(At least, ‘is this person’s brain actually angry?’ is the thing we mostly care about if it’s a friend we’re thinking about, or if we’re thinking about someone whose welfare and happiness matters to us. If we’re instead worried about someone physically attacking us, then sure, ‘are they going to exhibit angry-seeming behaviors?’ matters more in the moment then ‘are they really and truly angry in their heart of hearts?’.)
I expect some conceptual revision to be required to find the closest neural/cognitive correlate of ‘sentience’. But the same is plausibly true for ‘anger’, partly because anger is itself a thing that people typically think of as a sentient/conscious state!
One crude way of thinking about ‘sentience’ is that it’s just the disjunction of all the specific conscious states: anger, experiencing the color red, experiencing a sour taste, suffering, boredom...
Just as we can be uncertain about whether someone’s brain is ‘really’ angry, we can be uncertain about whether it’s experiencing any of the conscious states on the long list of candidates.
It would be obviously silly to say ‘we know with certainty that cats truly instantiate human-style anger in their brains, since, after all, my cat sometimes makes loud vocalizations and hisses at things’.
It would be even sillier to say ’whether cats are angry purely consists in whether they exhibit loud vocalizations, hiss at things, etc.; there’s no further important question about how their brains work, even though brain state obviously matters in the case of humans, when we ascribe “anger” to a human!
It isn’t any less silly to do those things in the case of the more general and abstract category, than to do it in the case of the concrete instance like ‘anger’.
Good point! I agree that “I feel empathy towards X” is only sufficient to strongly[1] motivate me to help X is I also believe that X is “real”. But, I also believe that my interactions with cats are strong evidence that cats are “real”, despite my ignorance about the inner workings of cat brains. This is exactly the same as, my interactions with humans are strong evidence that humans are “real”, despite my ignorance about human brains. And, people justifiably knew that other people are “real” even before it was discovered that the brain is responsible for cognition.
I agree that there’s a straight line[2]. But, the reason we know brains are relevant, is by observing that brain states are correlated with behavior. If instead of discovering that cognition runs on brains, we would discover it runs on transistor circuits, or computed somehow inside the liver, we would care about those transistor circuits / livers instead. So, your objection that “we don’t know enough about cat brains” is weak, since I do know that cat-brains produce cat-behavior, and given that correlation-with-behavior is the only reason we’re looking at brains in the first place, this knowledge counts for a lot, even if it’s far from a perfect picture of how cat brains work. I also don’t know have a perfect picture of how human brains work, but I know enough (from observing behavior!) to conclude that I care about humans.
I actually do feel some preference for fictional stories in which too-horrible things happen not to exist, even if I’m not consuming those stories, but that’s probably tangential.
I’m not sure I agree with “the concept of mind refers to various states or processes of brains”. We know that, for animals, there is a correspondence between minds and brains. But e.g. an AI can have a mind without having a brain. I guess you’re talking “brains” which are not necessarily biological? But then are “mind” and “brain” just synonyms? Or “brain” refers to some kind of strong reductionism? But, I can also imagine a different universe in which minds are ontologically fundamental ingredients of physics.
But you can still use behaviour/empathy to determine low cutoff of mind-similarity when you translate your utility function from native ontology to real mind-states. Caring about everything, that made you sad before doesn’t sound horrible, like not caring about anything that didn’t make you sad.
Not sure about Rob’s view, but I think a lot of people start out from this question from a quasi-dualistic perspective: some entities have “internal experiences”, “what-it’s-like-to-be-them”, basically some sort of invisible canvas on which internal experiences, including pleasure and pain, are projected. Then later, it comes to seem that basically everything is physical. So then they reason like “well, everything else in reality has eventually been reduced to physical things, so I’m not sure how, but eventually we will find a way to reduce the invisible canvases as well”. Then in principle, once we know how that reduction works, it could turn out that humans do have something corresponding to an invisible canvas but cats don’t.
As you might guess, I think this view of consciousness is somewhat confused, but it’s a sensible enough starting point in the absence of a reductionist theory of consciousness. I think the actual reduction looks more like an unbundling of the various functions that the ‘invisible canvas’ served in our previous models. So it seems likely that cats have states they find aversive, that they try to avoid, they take in sensory input to build a local model of the world, perhaps a global neuronal workspace, etc., all of which inclines me to have a certain amount of sympathy with them. What they probably don’t have is the meta-learned machinery which would make them think there is a hard problem of consciousness, but this doesn’t intuitively feel like it should make me care about them less.
I’m an eliminativist about phenomenal consciousness. :) So I’m pretty far from the dualist perspective, as these things go...!
But discovering that there are no souls doesn’t cause me to stop caring about human welfare. In the same way, discovering that there is no phenomenal consciousness doesn’t cause me to stop caring about human welfare.
Nor does it cause me to decide that ‘human welfare’ is purely a matter of ‘whether the human is smiling, whether they say they’re happy, etc.‘. If someone trapped a suffering human brain inside a robot or flesh suit that perpetually smiles, and I learned of this fact, I wouldn’t go ‘Oh, well the part I care about is the external behavior, not the brain state’. I’d go ‘holy shit no’ and try to find a way to alleviate the brain’s suffering and give it a better way to communicate.
Smiling, saying you’re happy, etc. matter to me almost entirely because I believe they correlate with particular brain states (e.g., the closest neural correlate for the folk concept of ‘happiness’). I don’t need a full reduction of ‘happiness’ in order to know that it has something to do with the state of brains. Ditto ‘sentience’, to the extent there’s a nearest-recoverable-concept corresponding to the folk notion.
Do you value conscious experience in yourself more than unconscious perception with roughly the same resulting external behavior? Then it is conceivable that empathy is mistaken about what kind of system is receiving inputs in cat’s case and there is at least difference in value depending on internal organization of cat’s brain.
I’m struggling to think of a good example for this? Usually conscious experience causes at least one difference in external behavior, namely that I might tell you about it if you ask me. Cats can’t talk, which does affect my attitude towards cats, but I don’t think my empathy somehow fails to take it into account?
But you don’t value conscious experience because you told me, right? Or you don’t value it proportionally to external behavior. Then that’s another intuition about personhood that you will need to include, so you’ll interpolate from “conscious parts of me—person”, “unconscious parts of me—non-person”, “rock—non-person”, and may decide that cats are more like unconscious parts of you.
I object to the classification “conscious parts of me—person”, “unconscious parts of me—non-person”. I think that personhood is more like a collective property of the whole than residing in just the “conscious parts”. And, I don’t think my caring-about-myself is pointing towards only the “conscious parts”. I agree that cats might lack a part that humans have which has something to do with consciousness (with the important caveat that “consciousness” is an ill-defined term that probably refers to different things in different contexts), and this probably reduces the amount I care about them, but it still leaves a lot of me-caring-about-them.
So like “humans − 1.5”, “cats − 1.0“, “rocks − 0.0” instead of “1.0, 0.0, 0.0”? Ok then, sounds consistent. Someone might object that we call caring about non-conscious stuff “aesthetic preferences”, but I don’t see how caring about cat’s inner life usually expressed by behaviour is different.
From my perspective, ‘sentience is a wrong concept’ and ‘sentience isn’t the central thing we morally care about’ isn’t a crux. If I’m confused somehow about sentience, I still expect something similarly complicated about brain algorithms to be where nearly all the value lies, and I still expect ‘does looking at this organism’s external behaviors naively make me feel bad, in the absence of any deep neuroscience or psychology knowledge?’ to be an extraordinarily poor guide to the morally impatient aspects of the relevant brains.
There’s not even a consensus on what sentience means.
One in a hundred likely won’t be enough if the organization doing the boxing is sufficiently security conscious. (And if not, there will likely be other issues.)