Thinking about it more, my brain generates the following argument for the perspective I think you’re advocating:
Consciousness and human values are both complicated kludges, but they’re different complicated kludges, and they aren’t correlated (because evolution didn’t understand what ‘consciousness’ was when it built us, so it didn’t try to embed that entire complicated entity into our values, it just embedded various messy correlates that break down pretty easily).
It would therefore be surprising if any highly specific cognitive feature of humans ended up being core to our values. It’s less surprising if a simple (and therefore more widespread) cognitive thingie ends up important to our values, because although the totality of human values is very complex, a lot of the real-world things referred to by specific pieces of human value (e.g., ‘boo loud sudden noises’) are quite simple.
A lot of the complexity of values comes from the fact that it glues together an enormous list of many different relatively-simple things (orgasms, symmetry, lush green plants, the sound of birds chirping, the pleasure of winning a game), and then these need to interact in tons of complicated ways.
In some cases, there probably are much-more-complicated entities in our values. But any given specific complicated thing will be a lot harder to exactly locate in our values, because it’s less likely on priors that evolution will hand-code that thing into our brains, or hand-code a way for humans to reliably learn that value during development.
This argument moves me some, and maybe I’ll change my mind after chewing on it more.
I think the main reasons I don’t currently find it super compelling are:
1 - I think a lot of human values look like pointers to real-world phenomena, rather than encodings of real-world phenomena. Humans care about certain kinds of human-ish minds (which may or may not be limited to human beings). Rather than trying to hand-code a description of ‘mind that’s human-ish in the relevant way’, evolution builds in a long list of clues and correlates that let us locate the ‘human-ish mind’ object in the physical world, and glom on to that object. The full complexity of the consciousness-engine is likely to end up pretty central to our values by that method (even though not everything about that engine as it’s currently implemented in human brains is going to be essential—there are a lot of ways to build a piston steam engine).
I do think there will be a lot of surprises and weird edge cases in ‘the kind of mind we value’. But I think these are much more likely to arise if we build new minds that deliberately push toward the edges of our concept. I think it’s much less likely that we’ll care about chickens, rocks, or electrons because these pre-existing entities just happen to exploit a weird loophole in our empathy-ish values—most natural phenomena don’t have keys that are exactly the right shape to exploit a loophole in human values.
(I do think it’s not at all implausible that chickens could turn out to have ‘human-ish minds’ in the relevant sense. Maybe somewhere between 10% likely and 40% likely? But if chickens are moral patients according to our morality, I think it will be because it empirically turns out to be the case that ‘being conscious in the basic way humans are’ arose way earlier on the evolutionary tree, or arose multiple times on the tree, not because our brain’s moral ‘pointer toward human-ish minds’ is going haywire and triggering (to various degrees) in response to just about everything, in a way that our CEV deeply endorses.)
2 - In cases like this, I also don’t think humans care much about the pointers themselves, or the ‘experience of feeling as though something is human-like’—rather, humans care about whether the thing is actually human-like (in this particular not-yet-fully-understood way).
3 - Moral intuitions like fairness, compassion, respect-for-autonomy, punishment for misdeeds, etc. -- unlike values like ‘beauty’ or ‘disgust’—seem to me to all point at this poorly-understood notion of a ‘person’. We can list a ton of things that seem to be true of ‘people’, and we can wonder which of those things will turn out to be more or less central. We can wonder whether chickens will end up being ‘people-like’ in the ways that matter for compassion, even if we’re pretty sure they aren’t ‘people-like’ in the ways that matter for ‘punishment for misdeeds’.
But regardless, I think eventually (if we don’t kill ourselves first) we’re just going to figure out what these values (or reflectively endorsed versions of these values) are. And I don’t think eg ‘respect-for-autonomy’ is going to be a thing that smoothly increases from the electron level to the ‘full human brain’ level; I think it’s going to point at a particular (though perhaps large!) class of complicated engines.
Thinking about it more, my brain generates the following argument for the perspective I think you’re advocating:
I’m not actually sure if that’s the exact argument I had in mind while writing the part about kludges, but I do find it fairly compelling, especially the way you had written it. Thanks.
I think a lot of human values look like pointers to real-world phenomena, rather than encodings of real-world phenomena.
I apologize for not being a complete response here, but I think if I were to try to summarize a few lingering general disagreements, I would say,
“Human values” don’t seem to be primarily what I care about. I care about “my values” and I’m skeptical that “human values” will converge onto what I care about.
I have intuitions that ethics is a lot more arbitrary than you seem to think it is. Your argument is peppered with statements to the effect of what would our CEV endorse?. I do agree that some degree of self-reflection is good, but I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
You appear to have interpreted my intuitions about the arbitrariness of concepts as instead about the complexity and fragility of concepts, which you expressed in confusion. Note that I think this reflects a basic miscommunication on my part, not yours. I do have some intuitions about complexity, less about fragility; but my statements above were (supposed to be) more about arbitrariness (I think).
I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
I think ethics is totally arbitrary in the sense that it’s just ‘what people happened to evolve’, but I don’t think it’s that complex or heterogeneous from the perspective of a superintelligence. There’s a limit to how much load-bearing complexity a human brain can even fit.
And I don’t think eg ‘respect-for-autonomy’ is going to be a thing that smoothly increases from the electron level to the ‘full human brain’ level; I think it’s going to point at a particular (though perhaps large!) class of complicated engines.
I actually agree with this, and I suspect that we might not disagree as much as you think if we put “credences” on what we thought were conscious. I’d identify my view as somewhere between Luke’s view and Brian’s view, which takes into account Brian’s cosmopolitan perspective while insisting that consciousness is indeed a higher-level thing that doesn’t seem to be built into the universe.
The way I imagine any successful theory of consciousness going is that even if it has a long parts (processes) list, every feature on that list will apply pretty ubiquitously to at least a tiny degree. Even if the parts need to combine in certain ways, that could also happen to a tiny degree in basically everything, although I’m much less sure of this claim; I’m much more confident that I can find the parts in a lot of places than in the claim that basically everything is like each part, so finding the right combinations could be much harder. The full complexity of consciousness might still be found in basically everything, just to a usually negligible degree.
Thinking about it more, my brain generates the following argument for the perspective I think you’re advocating:
Consciousness and human values are both complicated kludges, but they’re different complicated kludges, and they aren’t correlated (because evolution didn’t understand what ‘consciousness’ was when it built us, so it didn’t try to embed that entire complicated entity into our values, it just embedded various messy correlates that break down pretty easily).
It would therefore be surprising if any highly specific cognitive feature of humans ended up being core to our values. It’s less surprising if a simple (and therefore more widespread) cognitive thingie ends up important to our values, because although the totality of human values is very complex, a lot of the real-world things referred to by specific pieces of human value (e.g., ‘boo loud sudden noises’) are quite simple.
A lot of the complexity of values comes from the fact that it glues together an enormous list of many different relatively-simple things (orgasms, symmetry, lush green plants, the sound of birds chirping, the pleasure of winning a game), and then these need to interact in tons of complicated ways.
In some cases, there probably are much-more-complicated entities in our values. But any given specific complicated thing will be a lot harder to exactly locate in our values, because it’s less likely on priors that evolution will hand-code that thing into our brains, or hand-code a way for humans to reliably learn that value during development.
This argument moves me some, and maybe I’ll change my mind after chewing on it more.
I think the main reasons I don’t currently find it super compelling are:
1 - I think a lot of human values look like pointers to real-world phenomena, rather than encodings of real-world phenomena. Humans care about certain kinds of human-ish minds (which may or may not be limited to human beings). Rather than trying to hand-code a description of ‘mind that’s human-ish in the relevant way’, evolution builds in a long list of clues and correlates that let us locate the ‘human-ish mind’ object in the physical world, and glom on to that object. The full complexity of the consciousness-engine is likely to end up pretty central to our values by that method (even though not everything about that engine as it’s currently implemented in human brains is going to be essential—there are a lot of ways to build a piston steam engine).
I do think there will be a lot of surprises and weird edge cases in ‘the kind of mind we value’. But I think these are much more likely to arise if we build new minds that deliberately push toward the edges of our concept. I think it’s much less likely that we’ll care about chickens, rocks, or electrons because these pre-existing entities just happen to exploit a weird loophole in our empathy-ish values—most natural phenomena don’t have keys that are exactly the right shape to exploit a loophole in human values.
(I do think it’s not at all implausible that chickens could turn out to have ‘human-ish minds’ in the relevant sense. Maybe somewhere between 10% likely and 40% likely? But if chickens are moral patients according to our morality, I think it will be because it empirically turns out to be the case that ‘being conscious in the basic way humans are’ arose way earlier on the evolutionary tree, or arose multiple times on the tree, not because our brain’s moral ‘pointer toward human-ish minds’ is going haywire and triggering (to various degrees) in response to just about everything, in a way that our CEV deeply endorses.)
2 - In cases like this, I also don’t think humans care much about the pointers themselves, or the ‘experience of feeling as though something is human-like’—rather, humans care about whether the thing is actually human-like (in this particular not-yet-fully-understood way).
3 - Moral intuitions like fairness, compassion, respect-for-autonomy, punishment for misdeeds, etc. -- unlike values like ‘beauty’ or ‘disgust’—seem to me to all point at this poorly-understood notion of a ‘person’. We can list a ton of things that seem to be true of ‘people’, and we can wonder which of those things will turn out to be more or less central. We can wonder whether chickens will end up being ‘people-like’ in the ways that matter for compassion, even if we’re pretty sure they aren’t ‘people-like’ in the ways that matter for ‘punishment for misdeeds’.
But regardless, I think eventually (if we don’t kill ourselves first) we’re just going to figure out what these values (or reflectively endorsed versions of these values) are. And I don’t think eg ‘respect-for-autonomy’ is going to be a thing that smoothly increases from the electron level to the ‘full human brain’ level; I think it’s going to point at a particular (though perhaps large!) class of complicated engines.
I’m not actually sure if that’s the exact argument I had in mind while writing the part about kludges, but I do find it fairly compelling, especially the way you had written it. Thanks.
I apologize for not being a complete response here, but I think if I were to try to summarize a few lingering general disagreements, I would say,
“Human values” don’t seem to be primarily what I care about. I care about “my values” and I’m skeptical that “human values” will converge onto what I care about.
I have intuitions that ethics is a lot more arbitrary than you seem to think it is. Your argument is peppered with statements to the effect of what would our CEV endorse?. I do agree that some degree of self-reflection is good, but I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
You appear to have interpreted my intuitions about the arbitrariness of concepts as instead about the complexity and fragility of concepts, which you expressed in confusion. Note that I think this reflects a basic miscommunication on my part, not yours. I do have some intuitions about complexity, less about fragility; but my statements above were (supposed to be) more about arbitrariness (I think).
I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
I think ethics is totally arbitrary in the sense that it’s just ‘what people happened to evolve’, but I don’t think it’s that complex or heterogeneous from the perspective of a superintelligence. There’s a limit to how much load-bearing complexity a human brain can even fit.
I actually agree with this, and I suspect that we might not disagree as much as you think if we put “credences” on what we thought were conscious. I’d identify my view as somewhere between Luke’s view and Brian’s view, which takes into account Brian’s cosmopolitan perspective while insisting that consciousness is indeed a higher-level thing that doesn’t seem to be built into the universe.
The way I imagine any successful theory of consciousness going is that even if it has a long parts (processes) list, every feature on that list will apply pretty ubiquitously to at least a tiny degree. Even if the parts need to combine in certain ways, that could also happen to a tiny degree in basically everything, although I’m much less sure of this claim; I’m much more confident that I can find the parts in a lot of places than in the claim that basically everything is like each part, so finding the right combinations could be much harder. The full complexity of consciousness might still be found in basically everything, just to a usually negligible degree.
I’ve written more on this here.