I think this is a valid viewpoint, and I find it to be fairly similar to the one Luke Muehlhauser expressed in this dialogue. I sympathize with it quite a lot, but ultimately I part ways with it.
I hadn’t seen that before! I love it, and I very much share Luke’s intuitions there (maybe no surprise, since I think his intuitions are stunningly good on both moral philosophy and consciousness). Thanks for the link. :)
The difference, I imagine, is that consciousness doesn’t seem to be defined via a set of easily identifiable functional features.
Granted, but this seems true of a great many psychology concepts. Psychological concepts are generally poorly understood and very far from being formally defined, yet I’m confident we shouldn’t expect to find that rocks are a little bit repressing their emotions, or that cucumbers are kind of directing their attention at something, or that the sky’s relationship to the ground is an example of New Relationship Energy. ‘The sky is in NRE with the ground’ is doomed to always be a line of poetry, never a line of cognitive science.
(In some cases we’ve introduced new technical terms, like information-theoretic surprisal, that borrow psychological language. I think this is more common than successful attempts attempts to fully formalize/define how a high-level psychological phenomenon occurs in humans or other brains.)
I do expect some concept revision to occur as we improve our understanding of psychology. But I think our state is mostly ‘human psychology is really complicated, so we don’t understand it well yet’, not ‘we have empirically confirmed that human psychological attributes are continuous with the attributes of amoebas, rocks, etc.’.
I don’t think we’ve made substantial progress in answering the other question with simple, crisp models.
[...]
The brain is a kludge
My view is:
Our core, ultimate values are something we know very, very little about.
The true nature of consciousness is something we know almost nothing about.
Which particular computational processes are occurring in animal brains is something we know almost nothing about.
When you combine three blank areas of your map, the blank parts don’t cancel out. Instead, you get a part of your map that you should be even more uncertain about.
I don’t see a valid way to leverage that blankness-of-map to concentrate probability mass on ‘these three huge complicated mysterious brain-things are really similar to rocks, fungi, electrons, etc.’.
Rather, ‘moral value is a kludge’ and ‘consciousness is a kludge’ both make me update toward thinking the set of moral patients are smaller -- these engines don’t become less engine-y via being kludges, they just become more complicated and laden-with-arbitrary-structure.
A blank map of a huge complicated neural thingie enmeshed with verbal reasoning and a dozen other cognitive processes in intricate ways, is not the same as a filled-in map of something that’s low in detail and has very few crucial highly contingent or complex components. The lack of detail is in the map, but the territory can be extraordinarily detailed. And any of those details (either in our CEV, or in our consciousness) can turn out to be crucial in a way that’s currently invisible to us.
It sounds to me like you’re updating in the opposite direction—these things are kludges, therefore we should expect them (and their intersection, ‘things we morally value in a consciousness-style way’) to be simpler, more general, more universal, less laden with arbitrary hidden complexity. Why update in that direction?
Thinking about it more, my brain generates the following argument for the perspective I think you’re advocating:
Consciousness and human values are both complicated kludges, but they’re different complicated kludges, and they aren’t correlated (because evolution didn’t understand what ‘consciousness’ was when it built us, so it didn’t try to embed that entire complicated entity into our values, it just embedded various messy correlates that break down pretty easily).
It would therefore be surprising if any highly specific cognitive feature of humans ended up being core to our values. It’s less surprising if a simple (and therefore more widespread) cognitive thingie ends up important to our values, because although the totality of human values is very complex, a lot of the real-world things referred to by specific pieces of human value (e.g., ‘boo loud sudden noises’) are quite simple.
A lot of the complexity of values comes from the fact that it glues together an enormous list of many different relatively-simple things (orgasms, symmetry, lush green plants, the sound of birds chirping, the pleasure of winning a game), and then these need to interact in tons of complicated ways.
In some cases, there probably are much-more-complicated entities in our values. But any given specific complicated thing will be a lot harder to exactly locate in our values, because it’s less likely on priors that evolution will hand-code that thing into our brains, or hand-code a way for humans to reliably learn that value during development.
This argument moves me some, and maybe I’ll change my mind after chewing on it more.
I think the main reasons I don’t currently find it super compelling are:
1 - I think a lot of human values look like pointers to real-world phenomena, rather than encodings of real-world phenomena. Humans care about certain kinds of human-ish minds (which may or may not be limited to human beings). Rather than trying to hand-code a description of ‘mind that’s human-ish in the relevant way’, evolution builds in a long list of clues and correlates that let us locate the ‘human-ish mind’ object in the physical world, and glom on to that object. The full complexity of the consciousness-engine is likely to end up pretty central to our values by that method (even though not everything about that engine as it’s currently implemented in human brains is going to be essential—there are a lot of ways to build a piston steam engine).
I do think there will be a lot of surprises and weird edge cases in ‘the kind of mind we value’. But I think these are much more likely to arise if we build new minds that deliberately push toward the edges of our concept. I think it’s much less likely that we’ll care about chickens, rocks, or electrons because these pre-existing entities just happen to exploit a weird loophole in our empathy-ish values—most natural phenomena don’t have keys that are exactly the right shape to exploit a loophole in human values.
(I do think it’s not at all implausible that chickens could turn out to have ‘human-ish minds’ in the relevant sense. Maybe somewhere between 10% likely and 40% likely? But if chickens are moral patients according to our morality, I think it will be because it empirically turns out to be the case that ‘being conscious in the basic way humans are’ arose way earlier on the evolutionary tree, or arose multiple times on the tree, not because our brain’s moral ‘pointer toward human-ish minds’ is going haywire and triggering (to various degrees) in response to just about everything, in a way that our CEV deeply endorses.)
2 - In cases like this, I also don’t think humans care much about the pointers themselves, or the ‘experience of feeling as though something is human-like’—rather, humans care about whether the thing is actually human-like (in this particular not-yet-fully-understood way).
3 - Moral intuitions like fairness, compassion, respect-for-autonomy, punishment for misdeeds, etc. -- unlike values like ‘beauty’ or ‘disgust’—seem to me to all point at this poorly-understood notion of a ‘person’. We can list a ton of things that seem to be true of ‘people’, and we can wonder which of those things will turn out to be more or less central. We can wonder whether chickens will end up being ‘people-like’ in the ways that matter for compassion, even if we’re pretty sure they aren’t ‘people-like’ in the ways that matter for ‘punishment for misdeeds’.
But regardless, I think eventually (if we don’t kill ourselves first) we’re just going to figure out what these values (or reflectively endorsed versions of these values) are. And I don’t think eg ‘respect-for-autonomy’ is going to be a thing that smoothly increases from the electron level to the ‘full human brain’ level; I think it’s going to point at a particular (though perhaps large!) class of complicated engines.
Thinking about it more, my brain generates the following argument for the perspective I think you’re advocating:
I’m not actually sure if that’s the exact argument I had in mind while writing the part about kludges, but I do find it fairly compelling, especially the way you had written it. Thanks.
I think a lot of human values look like pointers to real-world phenomena, rather than encodings of real-world phenomena.
I apologize for not being a complete response here, but I think if I were to try to summarize a few lingering general disagreements, I would say,
“Human values” don’t seem to be primarily what I care about. I care about “my values” and I’m skeptical that “human values” will converge onto what I care about.
I have intuitions that ethics is a lot more arbitrary than you seem to think it is. Your argument is peppered with statements to the effect of what would our CEV endorse?. I do agree that some degree of self-reflection is good, but I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
You appear to have interpreted my intuitions about the arbitrariness of concepts as instead about the complexity and fragility of concepts, which you expressed in confusion. Note that I think this reflects a basic miscommunication on my part, not yours. I do have some intuitions about complexity, less about fragility; but my statements above were (supposed to be) more about arbitrariness (I think).
I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
I think ethics is totally arbitrary in the sense that it’s just ‘what people happened to evolve’, but I don’t think it’s that complex or heterogeneous from the perspective of a superintelligence. There’s a limit to how much load-bearing complexity a human brain can even fit.
And I don’t think eg ‘respect-for-autonomy’ is going to be a thing that smoothly increases from the electron level to the ‘full human brain’ level; I think it’s going to point at a particular (though perhaps large!) class of complicated engines.
I actually agree with this, and I suspect that we might not disagree as much as you think if we put “credences” on what we thought were conscious. I’d identify my view as somewhere between Luke’s view and Brian’s view, which takes into account Brian’s cosmopolitan perspective while insisting that consciousness is indeed a higher-level thing that doesn’t seem to be built into the universe.
The way I imagine any successful theory of consciousness going is that even if it has a long parts (processes) list, every feature on that list will apply pretty ubiquitously to at least a tiny degree. Even if the parts need to combine in certain ways, that could also happen to a tiny degree in basically everything, although I’m much less sure of this claim; I’m much more confident that I can find the parts in a lot of places than in the claim that basically everything is like each part, so finding the right combinations could be much harder. The full complexity of consciousness might still be found in basically everything, just to a usually negligible degree.
When you combine three blank areas of your map, the blank parts don’t cancel out. Instead, you get a part of your map that you should be even more uncertain about.
I think this makes sense. However, and I don’t know whether I obfuscated this point somewhere, I don’t think I was arguing that we should be more certain about a particular theory. Indeed, from my perspective, I was arguing against reifying a single concept (self-reflectivity) as the thing that defines whether something is conscious, before we know anything about humans, much less whether humans are even capable of self-reflection in some discontinuous way from other animals.
Rather, ‘moral value is a kludge’ and ‘consciousness is a kludge’ both make me update toward thinking the set of moral patients are smaller -- these engines don’t become less engine-y via being kludges, they just become more complicated and laden-with-arbitrary-structure.
I guess that when I said that brains are kludges, I was trying to say that their boundaries were fuzzy, rather than saying that they have well-defined boundaries but that the concept is extremely fragile, such that if you take away a single property from them they cease to be human. (I probably shouldn’t have used the term, and described it this way).
Complex structures like “tables” tend to be the type of thing that if you modify them across one or two dimensions, they belong to the same category. By contrast, a hydrogen atom is simple, and is the type of thing that if you take a property away from it, it ceases to be a hydrogen atom.
When I imagined a “consciousness engine” I visualized a simple system with clear moving parts, like a hydrogen atom. And conceptually, one of those moving parts could be a highly modular self-reflectivity component. Under this view, it might make a lot of sense that self-reflectivity is the defining component to a human, but I don’t suspect these things are actually that cleanly separable from the rest of the system.
In other words, it seems like the best model of a “table” or some other highly fuzzy concept, is not some extremely precise description of the exact properties that define a table, but rather some additive model in which each feature contributes some “tableness”, and such that no feature alone can either make something a table or prevent something from being a table. My intuitions about consciousness feel this way, but I’m not too certain about any of this.
I’d say my visualization of consciousness is less like a typical steam engine or table, and more like a Rube Goldberg machine designed by a very confused committee of terrible engineers. You can remove some parts of the machine without breaking anything, but a lot of other parts are necessary for the thing to work.
It should also be possible to design an AI that has ‘human-like consciousness’ via a much less kludge-ish process—I don’t think that much complexity is morally essential.
But chickens were built by a confused committee just like humans were, so they’ll have their own enormous intricate kludges (which may or may not be the same kind of machine as the Consciousness Machine in our heads), rather than having the really efficient small version of the consciousness-machine.
I hadn’t seen that before! I love it, and I very much share Luke’s intuitions there (maybe no surprise, since I think his intuitions are stunningly good on both moral philosophy and consciousness). Thanks for the link. :)
Granted, but this seems true of a great many psychology concepts. Psychological concepts are generally poorly understood and very far from being formally defined, yet I’m confident we shouldn’t expect to find that rocks are a little bit repressing their emotions, or that cucumbers are kind of directing their attention at something, or that the sky’s relationship to the ground is an example of New Relationship Energy. ‘The sky is in NRE with the ground’ is doomed to always be a line of poetry, never a line of cognitive science.
(In some cases we’ve introduced new technical terms, like information-theoretic surprisal, that borrow psychological language. I think this is more common than successful attempts attempts to fully formalize/define how a high-level psychological phenomenon occurs in humans or other brains.)
I do expect some concept revision to occur as we improve our understanding of psychology. But I think our state is mostly ‘human psychology is really complicated, so we don’t understand it well yet’, not ‘we have empirically confirmed that human psychological attributes are continuous with the attributes of amoebas, rocks, etc.’.
My view is:
Our core, ultimate values are something we know very, very little about.
The true nature of consciousness is something we know almost nothing about.
Which particular computational processes are occurring in animal brains is something we know almost nothing about.
When you combine three blank areas of your map, the blank parts don’t cancel out. Instead, you get a part of your map that you should be even more uncertain about.
I don’t see a valid way to leverage that blankness-of-map to concentrate probability mass on ‘these three huge complicated mysterious brain-things are really similar to rocks, fungi, electrons, etc.’.
Rather, ‘moral value is a kludge’ and ‘consciousness is a kludge’ both make me update toward thinking the set of moral patients are smaller -- these engines don’t become less engine-y via being kludges, they just become more complicated and laden-with-arbitrary-structure.
A blank map of a huge complicated neural thingie enmeshed with verbal reasoning and a dozen other cognitive processes in intricate ways, is not the same as a filled-in map of something that’s low in detail and has very few crucial highly contingent or complex components. The lack of detail is in the map, but the territory can be extraordinarily detailed. And any of those details (either in our CEV, or in our consciousness) can turn out to be crucial in a way that’s currently invisible to us.
It sounds to me like you’re updating in the opposite direction—these things are kludges, therefore we should expect them (and their intersection, ‘things we morally value in a consciousness-style way’) to be simpler, more general, more universal, less laden with arbitrary hidden complexity. Why update in that direction?
Thinking about it more, my brain generates the following argument for the perspective I think you’re advocating:
Consciousness and human values are both complicated kludges, but they’re different complicated kludges, and they aren’t correlated (because evolution didn’t understand what ‘consciousness’ was when it built us, so it didn’t try to embed that entire complicated entity into our values, it just embedded various messy correlates that break down pretty easily).
It would therefore be surprising if any highly specific cognitive feature of humans ended up being core to our values. It’s less surprising if a simple (and therefore more widespread) cognitive thingie ends up important to our values, because although the totality of human values is very complex, a lot of the real-world things referred to by specific pieces of human value (e.g., ‘boo loud sudden noises’) are quite simple.
A lot of the complexity of values comes from the fact that it glues together an enormous list of many different relatively-simple things (orgasms, symmetry, lush green plants, the sound of birds chirping, the pleasure of winning a game), and then these need to interact in tons of complicated ways.
In some cases, there probably are much-more-complicated entities in our values. But any given specific complicated thing will be a lot harder to exactly locate in our values, because it’s less likely on priors that evolution will hand-code that thing into our brains, or hand-code a way for humans to reliably learn that value during development.
This argument moves me some, and maybe I’ll change my mind after chewing on it more.
I think the main reasons I don’t currently find it super compelling are:
1 - I think a lot of human values look like pointers to real-world phenomena, rather than encodings of real-world phenomena. Humans care about certain kinds of human-ish minds (which may or may not be limited to human beings). Rather than trying to hand-code a description of ‘mind that’s human-ish in the relevant way’, evolution builds in a long list of clues and correlates that let us locate the ‘human-ish mind’ object in the physical world, and glom on to that object. The full complexity of the consciousness-engine is likely to end up pretty central to our values by that method (even though not everything about that engine as it’s currently implemented in human brains is going to be essential—there are a lot of ways to build a piston steam engine).
I do think there will be a lot of surprises and weird edge cases in ‘the kind of mind we value’. But I think these are much more likely to arise if we build new minds that deliberately push toward the edges of our concept. I think it’s much less likely that we’ll care about chickens, rocks, or electrons because these pre-existing entities just happen to exploit a weird loophole in our empathy-ish values—most natural phenomena don’t have keys that are exactly the right shape to exploit a loophole in human values.
(I do think it’s not at all implausible that chickens could turn out to have ‘human-ish minds’ in the relevant sense. Maybe somewhere between 10% likely and 40% likely? But if chickens are moral patients according to our morality, I think it will be because it empirically turns out to be the case that ‘being conscious in the basic way humans are’ arose way earlier on the evolutionary tree, or arose multiple times on the tree, not because our brain’s moral ‘pointer toward human-ish minds’ is going haywire and triggering (to various degrees) in response to just about everything, in a way that our CEV deeply endorses.)
2 - In cases like this, I also don’t think humans care much about the pointers themselves, or the ‘experience of feeling as though something is human-like’—rather, humans care about whether the thing is actually human-like (in this particular not-yet-fully-understood way).
3 - Moral intuitions like fairness, compassion, respect-for-autonomy, punishment for misdeeds, etc. -- unlike values like ‘beauty’ or ‘disgust’—seem to me to all point at this poorly-understood notion of a ‘person’. We can list a ton of things that seem to be true of ‘people’, and we can wonder which of those things will turn out to be more or less central. We can wonder whether chickens will end up being ‘people-like’ in the ways that matter for compassion, even if we’re pretty sure they aren’t ‘people-like’ in the ways that matter for ‘punishment for misdeeds’.
But regardless, I think eventually (if we don’t kill ourselves first) we’re just going to figure out what these values (or reflectively endorsed versions of these values) are. And I don’t think eg ‘respect-for-autonomy’ is going to be a thing that smoothly increases from the electron level to the ‘full human brain’ level; I think it’s going to point at a particular (though perhaps large!) class of complicated engines.
I’m not actually sure if that’s the exact argument I had in mind while writing the part about kludges, but I do find it fairly compelling, especially the way you had written it. Thanks.
I apologize for not being a complete response here, but I think if I were to try to summarize a few lingering general disagreements, I would say,
“Human values” don’t seem to be primarily what I care about. I care about “my values” and I’m skeptical that “human values” will converge onto what I care about.
I have intuitions that ethics is a lot more arbitrary than you seem to think it is. Your argument is peppered with statements to the effect of what would our CEV endorse?. I do agree that some degree of self-reflection is good, but I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
You appear to have interpreted my intuitions about the arbitrariness of concepts as instead about the complexity and fragility of concepts, which you expressed in confusion. Note that I think this reflects a basic miscommunication on my part, not yours. I do have some intuitions about complexity, less about fragility; but my statements above were (supposed to be) more about arbitrariness (I think).
I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
I think ethics is totally arbitrary in the sense that it’s just ‘what people happened to evolve’, but I don’t think it’s that complex or heterogeneous from the perspective of a superintelligence. There’s a limit to how much load-bearing complexity a human brain can even fit.
I actually agree with this, and I suspect that we might not disagree as much as you think if we put “credences” on what we thought were conscious. I’d identify my view as somewhere between Luke’s view and Brian’s view, which takes into account Brian’s cosmopolitan perspective while insisting that consciousness is indeed a higher-level thing that doesn’t seem to be built into the universe.
The way I imagine any successful theory of consciousness going is that even if it has a long parts (processes) list, every feature on that list will apply pretty ubiquitously to at least a tiny degree. Even if the parts need to combine in certain ways, that could also happen to a tiny degree in basically everything, although I’m much less sure of this claim; I’m much more confident that I can find the parts in a lot of places than in the claim that basically everything is like each part, so finding the right combinations could be much harder. The full complexity of consciousness might still be found in basically everything, just to a usually negligible degree.
I’ve written more on this here.
I think this makes sense. However, and I don’t know whether I obfuscated this point somewhere, I don’t think I was arguing that we should be more certain about a particular theory. Indeed, from my perspective, I was arguing against reifying a single concept (self-reflectivity) as the thing that defines whether something is conscious, before we know anything about humans, much less whether humans are even capable of self-reflection in some discontinuous way from other animals.
I guess that when I said that brains are kludges, I was trying to say that their boundaries were fuzzy, rather than saying that they have well-defined boundaries but that the concept is extremely fragile, such that if you take away a single property from them they cease to be human. (I probably shouldn’t have used the term, and described it this way).
Complex structures like “tables” tend to be the type of thing that if you modify them across one or two dimensions, they belong to the same category. By contrast, a hydrogen atom is simple, and is the type of thing that if you take a property away from it, it ceases to be a hydrogen atom.
When I imagined a “consciousness engine” I visualized a simple system with clear moving parts, like a hydrogen atom. And conceptually, one of those moving parts could be a highly modular self-reflectivity component. Under this view, it might make a lot of sense that self-reflectivity is the defining component to a human, but I don’t suspect these things are actually that cleanly separable from the rest of the system.
In other words, it seems like the best model of a “table” or some other highly fuzzy concept, is not some extremely precise description of the exact properties that define a table, but rather some additive model in which each feature contributes some “tableness”, and such that no feature alone can either make something a table or prevent something from being a table. My intuitions about consciousness feel this way, but I’m not too certain about any of this.
I’d say my visualization of consciousness is less like a typical steam engine or table, and more like a Rube Goldberg machine designed by a very confused committee of terrible engineers. You can remove some parts of the machine without breaking anything, but a lot of other parts are necessary for the thing to work.
It should also be possible to design an AI that has ‘human-like consciousness’ via a much less kludge-ish process—I don’t think that much complexity is morally essential.
But chickens were built by a confused committee just like humans were, so they’ll have their own enormous intricate kludges (which may or may not be the same kind of machine as the Consciousness Machine in our heads), rather than having the really efficient small version of the consciousness-machine.