My argument in this post is that there do exist mental models of people that are sufficiently detailed to qualify as conscious moral patients;
Sounds reasonable for at least some values of “sufficiently detailed”. At the limit, I expect that if someone had a computer emulation of my nervous system and all sensory information it receives, and all outputs it produces, and that emulation was good enough to write about its own personal experience of qualia for the same reasons I write about it, that emulation would “have” qualia in the sense that I care about.
At the other limit, a markov model trained on a bunch of my past text output which can produce writing which kinda sorta looks like it describes what it’s like to have qualia almost certainly does not “have” qualia in the sense that I care about (though the system-as-a-whole that produced the writing, i.e. “me originally writing the stuff” plus “the markov model doing its thing” does have qualia—they live in the “me originally experiencing the stuff I wrote about” bit).
In between the two extremes you’ve got stuff like tulpas, which I suspect are moral patients to the extent that it makes sense to talk about such a thing. That said, a lot of the reasons humans want to continue their thread of experience probably don’t apply to most tulpas (e.g. when a human dies, the substrate they were running on stops functioning, all their memories are lost, and they lose their ability to steer the world towards states they prefer whereas if a tulpa “dies” its memories are retained and its substrate remains intact, though it still I think loses its ability to steer the world towards its preferred states).
I am hesitant to condemn anything which looks to me like “thoughtcrime”, but to the extent that anything could be a thoughtcrime, “create tulpas and then do things that deeply violate their preferences” seems like one of those things. So if you’re doing that, maybe consider doing not-that?
I also argue that this is common enough that authors good at characterization probably frequently create and destroy such people; finally, I argue that this is a bad thing.
“Any mental model of a person” seems to me like drawing the line quite a bit further than it should be drawn. I don’t think mental models actually “have experiences” in any meaningful sense—I think they’re more analogous to markov models than they are to brain emulations (with the possible exception of tulpas and things like that, but those aren’t the sort of situations you find yourself in accidentally).
I do not think that literally any mental model of a person is a person, though I do draw the line further than you.
What are your reasons for thinking that mental models are closer to markov models than tulpas? My reason for leaning more on the latter side is my own experience writing, where I found it easy to create mental models of characters who behaved coherently and with whom I could have long conversations on a level above even GPT4, let alone markov models.
Another piece of evidence is this study. I haven’t done any actual digging to see if the methodology is any good, all I did was see the given statistic, but it is a much higher percentage than even I would have predicted before seeing it, and I already believed everything I wrote in this post!
Though I should be clear that whether or not a mental model is a person depends on the level of detail, and surely there are a lot that are not detailed enough to qualify. I just also think that there are a lot that do have enough detail, especially among writers.
That said, a lot of the reasons humans want to continue their thread of experience probably don’t apply to most tulpas (e.g. when a human dies, the substrate they were running on stops functioning, all their memories are lost, and they lose their ability to steer the world towards states they prefer whereas if a tulpa “dies” its memories are retained and its substrate remains intact, though it still I think loses its ability to steer the world towards its preferred states).
I find it interesting that multiple people have brought up “memories aren’t lost” as part of why it’s less bad for mental models or tulpas to die, since I personally don’t care if my memories live on after I die and would not consider that to be even close to true immortality.
What are your reasons for thinking that mental models are closer to markov models than tulpas?
I think this may just be a case of the typical mind fallacy: I don’t model people in that level of detail in practice and I’m not even sure I’m capable of doing so. I can make predictions about “the kind of thing a person might say” based on what they’ve said before, but those predictions are more at the level of turns-of-phrase and favored topics of conversation—definitely nothing like “long conversations on a level above GPT-4″.
The “why people value remaining alive” bit might also be a typical mind fallacy thing. I mostly think about personal identity in terms of memories + preferences.
I do agree that my memories alone living on after my body dies would not be close to immortality to me. However, if someone were to train a multimodal ML model that can produce actions in the world indistinguishable from the actions I produce (or even “distinguishable but very very close”), I would consider that to be most of the way to effectively being immortal, assuming that model were actually run and had the ability to steer the world towards states which it prefers. Conversely, I’d consider it effectively-death to be locked in a box where I couldn’t affect the state of the outside world and would never be able to exit the box. The scenario “my knowledge persists and can be used by people who share my values” would be worse, to me, than remaining alive but better than death without preserving my knowledge for people who share my values (and by “share my values” I basically just mean “are not actively trying to do things that I disprefer specifically because I disprefer them”).
I wouldn’t quite say it’s a typical mind fallacy, because I am not assuming that everyone is like me. I’m just also not assuming that everyone is different from me, and using heuristics to support my inference that it’s probably not too uncommon, such as reports by authors of their characters surprising them. Another small factor in my inference is the fact that I don’t know how I’d write good fiction without making mental models that qualified as people, though admittedly I have very high standards with respect to characterization in fiction.
(I am aware that I am not consistent about which phrase I use to describe just how common it is for models to qualify as people. This is because I don’t actually know how common it is, I only have inferences based on the evidence I already gave to go on.)
The rest of your post is interesting and I think I agree with it, though we’ve digressed from the original subject on that part.
Sounds reasonable for at least some values of “sufficiently detailed”. At the limit, I expect that if someone had a computer emulation of my nervous system and all sensory information it receives, and all outputs it produces, and that emulation was good enough to write about its own personal experience of qualia for the same reasons I write about it, that emulation would “have” qualia in the sense that I care about.
At the other limit, a markov model trained on a bunch of my past text output which can produce writing which kinda sorta looks like it describes what it’s like to have qualia almost certainly does not “have” qualia in the sense that I care about (though the system-as-a-whole that produced the writing, i.e. “me originally writing the stuff” plus “the markov model doing its thing” does have qualia—they live in the “me originally experiencing the stuff I wrote about” bit).
In between the two extremes you’ve got stuff like tulpas, which I suspect are moral patients to the extent that it makes sense to talk about such a thing. That said, a lot of the reasons humans want to continue their thread of experience probably don’t apply to most tulpas (e.g. when a human dies, the substrate they were running on stops functioning, all their memories are lost, and they lose their ability to steer the world towards states they prefer whereas if a tulpa “dies” its memories are retained and its substrate remains intact, though it still I think loses its ability to steer the world towards its preferred states).
I am hesitant to condemn anything which looks to me like “thoughtcrime”, but to the extent that anything could be a thoughtcrime, “create tulpas and then do things that deeply violate their preferences” seems like one of those things. So if you’re doing that, maybe consider doing not-that?
“Any mental model of a person” seems to me like drawing the line quite a bit further than it should be drawn. I don’t think mental models actually “have experiences” in any meaningful sense—I think they’re more analogous to markov models than they are to brain emulations (with the possible exception of tulpas and things like that, but those aren’t the sort of situations you find yourself in accidentally).
I do not think that literally any mental model of a person is a person, though I do draw the line further than you.
What are your reasons for thinking that mental models are closer to markov models than tulpas? My reason for leaning more on the latter side is my own experience writing, where I found it easy to create mental models of characters who behaved coherently and with whom I could have long conversations on a level above even GPT4, let alone markov models.
Another piece of evidence is this study. I haven’t done any actual digging to see if the methodology is any good, all I did was see the given statistic, but it is a much higher percentage than even I would have predicted before seeing it, and I already believed everything I wrote in this post!
Though I should be clear that whether or not a mental model is a person depends on the level of detail, and surely there are a lot that are not detailed enough to qualify. I just also think that there are a lot that do have enough detail, especially among writers.
I find it interesting that multiple people have brought up “memories aren’t lost” as part of why it’s less bad for mental models or tulpas to die, since I personally don’t care if my memories live on after I die and would not consider that to be even close to true immortality.
I think this may just be a case of the typical mind fallacy: I don’t model people in that level of detail in practice and I’m not even sure I’m capable of doing so. I can make predictions about “the kind of thing a person might say” based on what they’ve said before, but those predictions are more at the level of turns-of-phrase and favored topics of conversation—definitely nothing like “long conversations on a level above GPT-4″.
The “why people value remaining alive” bit might also be a typical mind fallacy thing. I mostly think about personal identity in terms of memories + preferences.
I do agree that my memories alone living on after my body dies would not be close to immortality to me. However, if someone were to train a multimodal ML model that can produce actions in the world indistinguishable from the actions I produce (or even “distinguishable but very very close”), I would consider that to be most of the way to effectively being immortal, assuming that model were actually run and had the ability to steer the world towards states which it prefers. Conversely, I’d consider it effectively-death to be locked in a box where I couldn’t affect the state of the outside world and would never be able to exit the box. The scenario “my knowledge persists and can be used by people who share my values” would be worse, to me, than remaining alive but better than death without preserving my knowledge for people who share my values (and by “share my values” I basically just mean “are not actively trying to do things that I disprefer specifically because I disprefer them”).
I wouldn’t quite say it’s a typical mind fallacy, because I am not assuming that everyone is like me. I’m just also not assuming that everyone is different from me, and using heuristics to support my inference that it’s probably not too uncommon, such as reports by authors of their characters surprising them. Another small factor in my inference is the fact that I don’t know how I’d write good fiction without making mental models that qualified as people, though admittedly I have very high standards with respect to characterization in fiction.
(I am aware that I am not consistent about which phrase I use to describe just how common it is for models to qualify as people. This is because I don’t actually know how common it is, I only have inferences based on the evidence I already gave to go on.)
The rest of your post is interesting and I think I agree with it, though we’ve digressed from the original subject on that part.
Thanks for you replies.