I’ve had some similar thoughts recently (spurred by a question seen on reddit) about how the instinctive fear of death is implemented.
It’s clearly quite robustly present. But we aren’t born understanding what death is, there’s a wide variety of situations that might threaten death that didn’t exist in any ancestral environment, and we definitely don’t learn from experience of dying that we don’t want to do it again in future.
We see a lot of people die, in the reality, fictions and dreams.
We also see a lot of people having sex or sexual desire in fictions or dreams before experiencing it.
IDK how strong this is a counter argument to how powerful the alignment in us is. Maybe a biological reward system + imitation+ fiction and later dreams is simply what is at play in humans.
I agree that fictional/cultural evidence is important for how people generalise their innate responses to new stimuli. Specifically, I think something similar to Steven Byrnes’ proxy matching is going on.
The idea is that we have certain hardwired instincts such as fear of death that are triggered in specific scenarios and we also independently learn a general world-model based on unsupervised learning which learns an independent and potentially un-emotive concept of death. Then we associate our instinctive reactions with this concept such that eventually our instinctive reactions generalise to other stimuli that also evoke this concept such as ones that are not present in the ancestral environment and for which we don’t have hardwired reactions to.
The fiction and cultural knowledge etc are super important for shaping our unsupervised concept of death—since they are the training data! The limits of this generalisation can also be interestingly seen in some cases where there is a lot of disagreement between people—for example in the classic case of a teleporter which scrambles your atoms at location X only to recreate an exact copy of you at location Y, people have very different instinctive reactions to whether this is ‘death’ or not which ultimately depends on their world-model concept and not on any hardwired reaction since there are no teleporters in the ancestral environment or now.
I suspect a process like this is also what generates ‘human values’ and will be writing something up on this shortly.
I’ve had some similar thoughts recently (spurred by a question seen on reddit) about how the instinctive fear of death is implemented.
It’s clearly quite robustly present. But we aren’t born understanding what death is, there’s a wide variety of situations that might threaten death that didn’t exist in any ancestral environment, and we definitely don’t learn from experience of dying that we don’t want to do it again in future.
We see a lot of people die, in the reality, fictions and dreams.
We also see a lot of people having sex or sexual desire in fictions or dreams before experiencing it.
IDK how strong this is a counter argument to how powerful the alignment in us is. Maybe a biological reward system + imitation+ fiction and later dreams is simply what is at play in humans.
I agree that fictional/cultural evidence is important for how people generalise their innate responses to new stimuli. Specifically, I think something similar to Steven Byrnes’ proxy matching is going on.
The idea is that we have certain hardwired instincts such as fear of death that are triggered in specific scenarios and we also independently learn a general world-model based on unsupervised learning which learns an independent and potentially un-emotive concept of death. Then we associate our instinctive reactions with this concept such that eventually our instinctive reactions generalise to other stimuli that also evoke this concept such as ones that are not present in the ancestral environment and for which we don’t have hardwired reactions to.
The fiction and cultural knowledge etc are super important for shaping our unsupervised concept of death—since they are the training data! The limits of this generalisation can also be interestingly seen in some cases where there is a lot of disagreement between people—for example in the classic case of a teleporter which scrambles your atoms at location X only to recreate an exact copy of you at location Y, people have very different instinctive reactions to whether this is ‘death’ or not which ultimately depends on their world-model concept and not on any hardwired reaction since there are no teleporters in the ancestral environment or now.
I suspect a process like this is also what generates ‘human values’ and will be writing something up on this shortly.
I do want to note that it can also hijack instrumental convergence in order to achieve alignment.