But one insight from illusionism is that all computations in the world, including ours, are more similar than we might otherwise thought: there’s no clear cut line between computations that are conscious and computations that aren’t.
Note that I lean toward disagreeing with this, even though I agree with a bunch of similar-sounding claims you’ve made here.
Also, like you, illusionism caused me to update toward thinking we’re “less special, and more like the world around us”. But I think I’m modeling the situation pretty differently, in a way that is making me update a lot less in that direction than you or Brian.
I think consciousness will end up looking something like ‘piston steam engine’, if we’d evolved to have a lot of terminal values related to the state of piston-steam-engine-ish things.
Piston steam engines aren’t a 100% crisp natural kind; there are other machines that are pretty similar to them; there are many different ways to build a piston steam engine; and, sure, in a world where our core evolved values were tied up with piston steam engines, it could shake out that we care at least a little about certain states of thermostats, rocks, hand gliders, trombones, and any number of other random things as a result of very distant analogical resemblances to piston steam engines.
But it’s still the case that a piston steam engine is a relatively specific (albeit not atomically or logically precise) machine; and it requires a bunch of parts to work in specific ways; and there isn’t an unbroken continuum from ‘rock’ to ‘piston steam engine’, rather there are sharp (though not atomically sharp) jumps when you get to thresholds that make the machine work at all.
Suppose you had absolutely no idea how the internals of a piston steam engine worked mechanically. And further suppose that you’ve been crazily obsessed with piston steam engines your whole life, all your dreams are about piston steam engines, nothing else makes you want to get up in the morning, etc. -- basically the state of humanity with respect to consciousness. It might indeed then be tempting to come up with a story about how everything in the universe is a “piston steam engine lite” at heart; or, failing that, how all steam engines, or all complex machines, are piston-steam-engines lite, to varying degrees.
The wise person who’s obsessed with piston steam engines, on the other hand, would recognize that she doesn’t know how the damned thing works; and when you don’t understand an engine, it often looks far more continuous with the rest of reality, far more fuzzy and simple and basic. “They’re all just engines, after all; why sweat the details?”
Recognizing this bias, the wise obsessive should be cautious about the impulse to treat this poorly-understood machine as though it were a very basic or very universal sort of thing; because as we learn more, we should expect a series of large directional updates about just how specific and contingent and parts-containing the thing we value is, compared to the endless variety of possible physical structures out there in the universe.
When your map is blank, it feels more plausible that there will be a “smooth drop-off”, because we aren’t picturing a large number of gears that will break when we tweak their location slightly. And because it feels as though almost anything could go in the blank spot, hence it’s harder to viscerally feel like it’s a huge matter of life-or-death which specific thing goes there.
I think consciousness will end up looking something like ‘piston steam engine’, if we’d evolved to have a lot of terminal values related to the state of piston-steam-engine-ish things.
I think that’s kind of the key question. Is what I care about as precise as “piston steam engine” or is it more like “mechanical devices in general, with a huge increase in caring as the thing becomes more and more like a piston steam engine”? This relates to the passage of mine that Matthew quoted above. If we say we care about (or that consciousness is) this thing going on in our heads, are we pointing at a very specific machine, or are we pointing at machines in general with a focus on the ones that are more similar to the exact one in our heads? In the extreme, a person who says “I care about what’s in my head” is an egoist who doesn’t care about other humans. Perhaps he would even be a short-term egoist who doesn’t care about his long-term future (since his brain will be more different by then). That’s one stance that some people take. But most of us try to generalize what we care about beyond our immediate selves. And then the question is how much to generalize.
It’s analogous to someone saying they love “that thing” and pointing at a piston steam engine. How much generality should we apply when saying what they value? Is it that particular piston steam engine? Piston steam engines in general? Engines in general? Mechanical devices in general with a focus on ones most like the particular piston steam engine being pointed to? It’s not clear, and people take widely divergent views here.
I think a similar fuzziness will apply when trying to decide for which entities “there’s something it’s like” to be those entities. There’s a wide range in possible views on how narrowly or broadly to interpret “something it’s like”.
yet I’m confident we shouldn’t expect to find that rocks are a little bit repressing their emotions, or that cucumbers are kind of directing their attention at something, or that the sky’s relationship to the ground is an example of New Relationship Energy.
I think those statements can apply to vanishing degrees. It’s usually not helpful to talk that way in ordinary life, but if we’re trying to have a full theory of repressing one’s emotions in general, I expect that one could draw some strained (or poetic, as you said) ways in which rocks are doing that. (Simple example: the chemical bonds in rocks are holding their atoms together, and without that the atoms of the rocks would move around more freely the way the atoms of a liquid or gas do.) IMO, the degree of applicability of the concept seems very low but not zero. This very low applicability is probably only going to matter in extreme situations, like if there are astronomical numbers of rocks compared with human-like minds.
I think consciousness will end up looking something like ‘piston steam engine’, if we’d evolved to have a lot of terminal values related to the state of piston-steam-engine-ish things.
I think this is a valid viewpoint, and I find it to be fairly similar to the one Luke Muehlhauser expressed in this dialogue. I sympathize with it quite a lot, but ultimately I part ways with it.
I suppose my main disagreement would probably boil down to a few things, including,
My intuition that conciousness is not easily classifiable in the same way a piston steam engine would be, even if you knew relatively little about how piston steam engines worked. I note that your viewpoint here seems similar to Eliezer’s analogy in Fake Causality. The difference, I imagine, is that consciousness doesn’t seem to be defined via a set of easily identifiable functional features. There is an extremely wide range of viewpoints about what constitutes a conscious experience, what properties consciousness has, and what people are even talking about when they use the word (even though it is sometimes said to be one of the “most basic” or “most elementary” concepts to us).
The question I care most about is not “how does consciousness work” but “what should I care about?” Progress on questions about “how X works” has historically yielded extremely crisp answers, explainable by models that use simple moving parts. I don’t think we’ve made substantial progress in answering the other question with simple, crisp models. One way of putting this is that if you came up to me with a well-validated, fundamental theory of consciousness (and somehow this was well defined), I might just respond, “That’s cool, but I care about things other than consciousness (as defined in that model).” It seems like the more you’re able to answer the question precisely and thoroughly, the more I’m probably going to disagree that the answer maps perfectly onto my intuitions about what I ought to care about.
The brain is a kludge, and doesn’t seem like the type of thing we should describe as a simple, coherent, unified engine. There are certainly many aspects of cognition that are very general, but most don’t seem like the type of thing I’d expect to be exclusively present in humans but not other animals. This touches on some disagreements I (perceive) I have with the foom perspective, but I think that even people from that camp would mostly agree with the weak version of this thesis.
I think this is a valid viewpoint, and I find it to be fairly similar to the one Luke Muehlhauser expressed in this dialogue. I sympathize with it quite a lot, but ultimately I part ways with it.
I hadn’t seen that before! I love it, and I very much share Luke’s intuitions there (maybe no surprise, since I think his intuitions are stunningly good on both moral philosophy and consciousness). Thanks for the link. :)
The difference, I imagine, is that consciousness doesn’t seem to be defined via a set of easily identifiable functional features.
Granted, but this seems true of a great many psychology concepts. Psychological concepts are generally poorly understood and very far from being formally defined, yet I’m confident we shouldn’t expect to find that rocks are a little bit repressing their emotions, or that cucumbers are kind of directing their attention at something, or that the sky’s relationship to the ground is an example of New Relationship Energy. ‘The sky is in NRE with the ground’ is doomed to always be a line of poetry, never a line of cognitive science.
(In some cases we’ve introduced new technical terms, like information-theoretic surprisal, that borrow psychological language. I think this is more common than successful attempts attempts to fully formalize/define how a high-level psychological phenomenon occurs in humans or other brains.)
I do expect some concept revision to occur as we improve our understanding of psychology. But I think our state is mostly ‘human psychology is really complicated, so we don’t understand it well yet’, not ‘we have empirically confirmed that human psychological attributes are continuous with the attributes of amoebas, rocks, etc.’.
I don’t think we’ve made substantial progress in answering the other question with simple, crisp models.
[...]
The brain is a kludge
My view is:
Our core, ultimate values are something we know very, very little about.
The true nature of consciousness is something we know almost nothing about.
Which particular computational processes are occurring in animal brains is something we know almost nothing about.
When you combine three blank areas of your map, the blank parts don’t cancel out. Instead, you get a part of your map that you should be even more uncertain about.
I don’t see a valid way to leverage that blankness-of-map to concentrate probability mass on ‘these three huge complicated mysterious brain-things are really similar to rocks, fungi, electrons, etc.’.
Rather, ‘moral value is a kludge’ and ‘consciousness is a kludge’ both make me update toward thinking the set of moral patients are smaller -- these engines don’t become less engine-y via being kludges, they just become more complicated and laden-with-arbitrary-structure.
A blank map of a huge complicated neural thingie enmeshed with verbal reasoning and a dozen other cognitive processes in intricate ways, is not the same as a filled-in map of something that’s low in detail and has very few crucial highly contingent or complex components. The lack of detail is in the map, but the territory can be extraordinarily detailed. And any of those details (either in our CEV, or in our consciousness) can turn out to be crucial in a way that’s currently invisible to us.
It sounds to me like you’re updating in the opposite direction—these things are kludges, therefore we should expect them (and their intersection, ‘things we morally value in a consciousness-style way’) to be simpler, more general, more universal, less laden with arbitrary hidden complexity. Why update in that direction?
Thinking about it more, my brain generates the following argument for the perspective I think you’re advocating:
Consciousness and human values are both complicated kludges, but they’re different complicated kludges, and they aren’t correlated (because evolution didn’t understand what ‘consciousness’ was when it built us, so it didn’t try to embed that entire complicated entity into our values, it just embedded various messy correlates that break down pretty easily).
It would therefore be surprising if any highly specific cognitive feature of humans ended up being core to our values. It’s less surprising if a simple (and therefore more widespread) cognitive thingie ends up important to our values, because although the totality of human values is very complex, a lot of the real-world things referred to by specific pieces of human value (e.g., ‘boo loud sudden noises’) are quite simple.
A lot of the complexity of values comes from the fact that it glues together an enormous list of many different relatively-simple things (orgasms, symmetry, lush green plants, the sound of birds chirping, the pleasure of winning a game), and then these need to interact in tons of complicated ways.
In some cases, there probably are much-more-complicated entities in our values. But any given specific complicated thing will be a lot harder to exactly locate in our values, because it’s less likely on priors that evolution will hand-code that thing into our brains, or hand-code a way for humans to reliably learn that value during development.
This argument moves me some, and maybe I’ll change my mind after chewing on it more.
I think the main reasons I don’t currently find it super compelling are:
1 - I think a lot of human values look like pointers to real-world phenomena, rather than encodings of real-world phenomena. Humans care about certain kinds of human-ish minds (which may or may not be limited to human beings). Rather than trying to hand-code a description of ‘mind that’s human-ish in the relevant way’, evolution builds in a long list of clues and correlates that let us locate the ‘human-ish mind’ object in the physical world, and glom on to that object. The full complexity of the consciousness-engine is likely to end up pretty central to our values by that method (even though not everything about that engine as it’s currently implemented in human brains is going to be essential—there are a lot of ways to build a piston steam engine).
I do think there will be a lot of surprises and weird edge cases in ‘the kind of mind we value’. But I think these are much more likely to arise if we build new minds that deliberately push toward the edges of our concept. I think it’s much less likely that we’ll care about chickens, rocks, or electrons because these pre-existing entities just happen to exploit a weird loophole in our empathy-ish values—most natural phenomena don’t have keys that are exactly the right shape to exploit a loophole in human values.
(I do think it’s not at all implausible that chickens could turn out to have ‘human-ish minds’ in the relevant sense. Maybe somewhere between 10% likely and 40% likely? But if chickens are moral patients according to our morality, I think it will be because it empirically turns out to be the case that ‘being conscious in the basic way humans are’ arose way earlier on the evolutionary tree, or arose multiple times on the tree, not because our brain’s moral ‘pointer toward human-ish minds’ is going haywire and triggering (to various degrees) in response to just about everything, in a way that our CEV deeply endorses.)
2 - In cases like this, I also don’t think humans care much about the pointers themselves, or the ‘experience of feeling as though something is human-like’—rather, humans care about whether the thing is actually human-like (in this particular not-yet-fully-understood way).
3 - Moral intuitions like fairness, compassion, respect-for-autonomy, punishment for misdeeds, etc. -- unlike values like ‘beauty’ or ‘disgust’—seem to me to all point at this poorly-understood notion of a ‘person’. We can list a ton of things that seem to be true of ‘people’, and we can wonder which of those things will turn out to be more or less central. We can wonder whether chickens will end up being ‘people-like’ in the ways that matter for compassion, even if we’re pretty sure they aren’t ‘people-like’ in the ways that matter for ‘punishment for misdeeds’.
But regardless, I think eventually (if we don’t kill ourselves first) we’re just going to figure out what these values (or reflectively endorsed versions of these values) are. And I don’t think eg ‘respect-for-autonomy’ is going to be a thing that smoothly increases from the electron level to the ‘full human brain’ level; I think it’s going to point at a particular (though perhaps large!) class of complicated engines.
Thinking about it more, my brain generates the following argument for the perspective I think you’re advocating:
I’m not actually sure if that’s the exact argument I had in mind while writing the part about kludges, but I do find it fairly compelling, especially the way you had written it. Thanks.
I think a lot of human values look like pointers to real-world phenomena, rather than encodings of real-world phenomena.
I apologize for not being a complete response here, but I think if I were to try to summarize a few lingering general disagreements, I would say,
“Human values” don’t seem to be primarily what I care about. I care about “my values” and I’m skeptical that “human values” will converge onto what I care about.
I have intuitions that ethics is a lot more arbitrary than you seem to think it is. Your argument is peppered with statements to the effect of what would our CEV endorse?. I do agree that some degree of self-reflection is good, but I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
You appear to have interpreted my intuitions about the arbitrariness of concepts as instead about the complexity and fragility of concepts, which you expressed in confusion. Note that I think this reflects a basic miscommunication on my part, not yours. I do have some intuitions about complexity, less about fragility; but my statements above were (supposed to be) more about arbitrariness (I think).
I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
I think ethics is totally arbitrary in the sense that it’s just ‘what people happened to evolve’, but I don’t think it’s that complex or heterogeneous from the perspective of a superintelligence. There’s a limit to how much load-bearing complexity a human brain can even fit.
And I don’t think eg ‘respect-for-autonomy’ is going to be a thing that smoothly increases from the electron level to the ‘full human brain’ level; I think it’s going to point at a particular (though perhaps large!) class of complicated engines.
I actually agree with this, and I suspect that we might not disagree as much as you think if we put “credences” on what we thought were conscious. I’d identify my view as somewhere between Luke’s view and Brian’s view, which takes into account Brian’s cosmopolitan perspective while insisting that consciousness is indeed a higher-level thing that doesn’t seem to be built into the universe.
The way I imagine any successful theory of consciousness going is that even if it has a long parts (processes) list, every feature on that list will apply pretty ubiquitously to at least a tiny degree. Even if the parts need to combine in certain ways, that could also happen to a tiny degree in basically everything, although I’m much less sure of this claim; I’m much more confident that I can find the parts in a lot of places than in the claim that basically everything is like each part, so finding the right combinations could be much harder. The full complexity of consciousness might still be found in basically everything, just to a usually negligible degree.
When you combine three blank areas of your map, the blank parts don’t cancel out. Instead, you get a part of your map that you should be even more uncertain about.
I think this makes sense. However, and I don’t know whether I obfuscated this point somewhere, I don’t think I was arguing that we should be more certain about a particular theory. Indeed, from my perspective, I was arguing against reifying a single concept (self-reflectivity) as the thing that defines whether something is conscious, before we know anything about humans, much less whether humans are even capable of self-reflection in some discontinuous way from other animals.
Rather, ‘moral value is a kludge’ and ‘consciousness is a kludge’ both make me update toward thinking the set of moral patients are smaller -- these engines don’t become less engine-y via being kludges, they just become more complicated and laden-with-arbitrary-structure.
I guess that when I said that brains are kludges, I was trying to say that their boundaries were fuzzy, rather than saying that they have well-defined boundaries but that the concept is extremely fragile, such that if you take away a single property from them they cease to be human. (I probably shouldn’t have used the term, and described it this way).
Complex structures like “tables” tend to be the type of thing that if you modify them across one or two dimensions, they belong to the same category. By contrast, a hydrogen atom is simple, and is the type of thing that if you take a property away from it, it ceases to be a hydrogen atom.
When I imagined a “consciousness engine” I visualized a simple system with clear moving parts, like a hydrogen atom. And conceptually, one of those moving parts could be a highly modular self-reflectivity component. Under this view, it might make a lot of sense that self-reflectivity is the defining component to a human, but I don’t suspect these things are actually that cleanly separable from the rest of the system.
In other words, it seems like the best model of a “table” or some other highly fuzzy concept, is not some extremely precise description of the exact properties that define a table, but rather some additive model in which each feature contributes some “tableness”, and such that no feature alone can either make something a table or prevent something from being a table. My intuitions about consciousness feel this way, but I’m not too certain about any of this.
I’d say my visualization of consciousness is less like a typical steam engine or table, and more like a Rube Goldberg machine designed by a very confused committee of terrible engineers. You can remove some parts of the machine without breaking anything, but a lot of other parts are necessary for the thing to work.
It should also be possible to design an AI that has ‘human-like consciousness’ via a much less kludge-ish process—I don’t think that much complexity is morally essential.
But chickens were built by a confused committee just like humans were, so they’ll have their own enormous intricate kludges (which may or may not be the same kind of machine as the Consciousness Machine in our heads), rather than having the really efficient small version of the consciousness-machine.
Note: I think there’s also a specific philosophical reason to think consciousness is pretty ubiquitous and fundamental—the hard problem of consciousness. The ‘we’re investing too much metaphysical importance into our pet obsession’ thing isn’t the only reason anyone thinks consciousness (or very-consciousness-ish things) might be ubiquitous.
But per illusionism, I think this philosophical reason turns out to be wrong in the end, leaving us without a principled reason to anthropomorphize / piston-steam-engine-omorphize the universe like that.
It’s true (on your view and mine) that there’s a pervasive introspective, quasi-perceptual illusion humans suffer about consciousness.
But the functional properties of consciousness (or of ‘the consciousness-like thing we actually have’) are all still there, behind the illusion.
Swapping from the illusory view to the almost-functionally-identical non-illusory view, I strongly expect, will not cause us to stop caring about the underlying real things (thoughts, and feelings, and memories, and love, and friendship).
And if we still care about those real things, then our utility function is still (I claim) pretty obsessed with some very specific and complicated engines/computations. (Indeed, a lot more specific and complicated than real-world piston steam engines.)
I’d expect it to mostly look more like how our orientation to water and oars changes when we realize that the oar reflected in the water isn’t really broken.
I don’t expect the revelation to cause humanity to replace its values with such vague values that we reshape our lives around slightly adjusting the spatial configurations of rocks or electrons, because our new ‘generalized friendship’ concept treats some common pebble configurations as more or less ‘friend-like’, more or less ‘asleep’, more or less ‘annoyed’, etc.
(Maybe we’ll do a little of that, for fun, as a sort of aesthetic project / a way of making the world feel more beautiful. But that gets us closer to my version of ‘generalizing human values to apply to unconscious stuff’, not Brian’s version.)
Swapping from the illusory view to the almost-functionally-identical non-illusory view, I strongly expect, will not cause us to stop caring about the underlying real things (thoughts, and feelings, and memories, and love, and friendship).
Putting aside my other disagreements for now (and I appreciate the other things you said), I’d like to note that I see my own view as “rescuing the utility function” far more than a view which asserts that non-human animals are largely unconscious automatons.
To the extent that learning to be a reductionist shouldn’t radically reshape what we care about, it seems clear to me that we shouldn’t stop caring about non-human animals, especially larger ones like pigs. I think most people, including the majority of people who eat meat regularly, think that animals are conscious. And I wouldn’t expect that having a dog or a cat personally would substantially negatively correlate with believing that animals are conscious (which would be weakly expected if we think our naive impressions track truth, and non-human animals aren’t conscious).
There have been quite a few surveys about this, though I’m not quickly coming up any good ones right now (besides perhaps this survey which found that 47% of people supported a ban on slaughterhouses, a result which was replicated, though it’s perhaps only about one third when you subtract those who don’t know what a slaughterhouse is).
To the extent that learning to be a reductionist shouldn’t radically reshape what we care about, it seems clear to me that we shouldn’t stop caring about non-human animals, especially larger ones like pigs. I think most people, including the majority of people who eat meat regularly, think that animals are conscious.
This seems totally wrong to me.
I’m an illusionist, but that doesn’t mean I think that humans’ values are indifferent between the ‘entity with a point of view’ cluster in thingspace (e.g., typical adult humans), and the ‘entity with no point of view’ cluster in thingspace (e.g., braindead humans).
Just the opposite: I think there’s an overwhelmingly large and absolutely morally crucial difference between ‘automaton that acts sort of like it has morally relevant cognitive processes’ (say, a crude robot or a cartoon hand-designed to inspire people to anthropomorphize it), and ‘thing that actually has the morally relevant cognitive processes’.
It’s a wide-open empirical question whether, e.g., dogs are basically ‘automata that lack the morally relevant cognitive processes altogether’, versus ‘things with the morally relevant cognitive processes’. And I think ‘is there something it’s like to be that dog?’ is actually a totally fine intuition pump for imperfectly getting at the kind of difference that morally matters here, even though this concept starts to break when you put philosophical weight on it (because of the ‘hard problem’ lllusion) and needs to be replaced with a probably-highly-similar functional equivalent.
Like, the ‘is there something it’s like to be X?’ question is subject to an illusion in humans, and it’s a real messy folk concept that will surely need to be massively revised as we figure out what’s really going on. But it’s surely closer to asking the morally important question about dogs, compared to terrible, overwhelmingly morally unimportant questions like ‘can the external physical behaviors of this entity trick humans into anthropomorphizing the entity and feeling like it has a human-ish inner life’.
Tricking humans into anthropomorphizing things is so easy! What matters is what’s in the dog’s head!
Like, yes, when I say ‘the moral evaluation function takes the dog’s brain as an input, not the cuteness of its overt behaviors’, I am talking about a moral evaluation function that we have to extract from the human’s brain.
But the human moral evaluation function is a totally different function from the ‘does-this-thing-make-noises-and-facial-expressions-that-naturally-make-me-feel-sympathy-for-it-before-I-learn-any-neuroscience?’ function, even though both are located in the human brain.
Thinking (with very low confidence) about an idealized, heavily self-modified, reflectively consistent, CEV-ish version of me:
If it turns out that squirrels are totally unconscious automata, then I think Ideal Me would probably at least weakly prefer to not go around stepping on squirrels for fun. I think this would be for two reasons:
The kind of reverence-for-beauty that makes me not want to randomly shred flowers to pieces. Squirrels can be beautiful even if they have no moral value. Gorgeous sunsets plausibly deserve a similar kind of reverence.
The kind of disgust that makes me not want to draw pictures of mutilated humans. There may be nothing morally important about the cognitive algorithms in squirrels’ brains; but squirrels still have a lot of anatomical similarities to humans, and the visual resemblance between the two is reason enough to be grossed out by roadkill.
In both cases, these don’t seem like obviously bad values to me. (And I’m pretty conservative about getting rid of my values! Though a lot can and should change eventually, as humanity figures out all the risks and implications of various self-modifications. Indeed, I think the above descriptions would probably look totally wrong, quaint, and confused to a real CEV of mine; but it’s my best guess for now.)
In contrast, conflating the moral worth of genuinely-totally-conscious things (insofar as anything is genuinely conscious) with genuinely-totally-unconscious things seems… actively bad, to me? Not a value worth endorsing or protecting?
Like, maybe you think it’s implausible that squirrels, with all their behavioral complexity, could have ‘the lights be off’ in the way that a roomba with a cute face glued to it has ‘the lights off’. I disagree somewhat, but I find that view vastly less objectionable than ‘it doesn’t even matter what the squirrel’s mind is like, it just matters how uneducated humans naively emotionally respond to the squirrel’s overt behaviors’.
Maybe a way of gesturing at the thing is: Phenomenal consciousness is an illusion, but the illusion adds up to normality. It doesn’t add up to ‘therefore the difference between automata / cartoon characters and things-that-actually-have-the-relevant-mental-machinery-in-their-brains suddenly becomes unimportant (or even less important)’.
I feel like people on here are just picking apart each other’s arguments without really dealing with the main arguments present. A lot of times it’s not very clear what the focus is on anyways. I think he’s just referring to a different perspective he’s read regarding one way to look at things. Your example of obsession is only used to discredit the legitimacy of that perspective instead of actually adding value to the conversation.
Note that I lean toward disagreeing with this, even though I agree with a bunch of similar-sounding claims you’ve made here.
Also, like you, illusionism caused me to update toward thinking we’re “less special, and more like the world around us”. But I think I’m modeling the situation pretty differently, in a way that is making me update a lot less in that direction than you or Brian.
I think consciousness will end up looking something like ‘piston steam engine’, if we’d evolved to have a lot of terminal values related to the state of piston-steam-engine-ish things.
Piston steam engines aren’t a 100% crisp natural kind; there are other machines that are pretty similar to them; there are many different ways to build a piston steam engine; and, sure, in a world where our core evolved values were tied up with piston steam engines, it could shake out that we care at least a little about certain states of thermostats, rocks, hand gliders, trombones, and any number of other random things as a result of very distant analogical resemblances to piston steam engines.
But it’s still the case that a piston steam engine is a relatively specific (albeit not atomically or logically precise) machine; and it requires a bunch of parts to work in specific ways; and there isn’t an unbroken continuum from ‘rock’ to ‘piston steam engine’, rather there are sharp (though not atomically sharp) jumps when you get to thresholds that make the machine work at all.
Suppose you had absolutely no idea how the internals of a piston steam engine worked mechanically. And further suppose that you’ve been crazily obsessed with piston steam engines your whole life, all your dreams are about piston steam engines, nothing else makes you want to get up in the morning, etc. -- basically the state of humanity with respect to consciousness. It might indeed then be tempting to come up with a story about how everything in the universe is a “piston steam engine lite” at heart; or, failing that, how all steam engines, or all complex machines, are piston-steam-engines lite, to varying degrees.
The wise person who’s obsessed with piston steam engines, on the other hand, would recognize that she doesn’t know how the damned thing works; and when you don’t understand an engine, it often looks far more continuous with the rest of reality, far more fuzzy and simple and basic. “They’re all just engines, after all; why sweat the details?”
Recognizing this bias, the wise obsessive should be cautious about the impulse to treat this poorly-understood machine as though it were a very basic or very universal sort of thing; because as we learn more, we should expect a series of large directional updates about just how specific and contingent and parts-containing the thing we value is, compared to the endless variety of possible physical structures out there in the universe.
When your map is blank, it feels more plausible that there will be a “smooth drop-off”, because we aren’t picturing a large number of gears that will break when we tweak their location slightly. And because it feels as though almost anything could go in the blank spot, hence it’s harder to viscerally feel like it’s a huge matter of life-or-death which specific thing goes there.
Thanks for this discussion. :)
I think that’s kind of the key question. Is what I care about as precise as “piston steam engine” or is it more like “mechanical devices in general, with a huge increase in caring as the thing becomes more and more like a piston steam engine”? This relates to the passage of mine that Matthew quoted above. If we say we care about (or that consciousness is) this thing going on in our heads, are we pointing at a very specific machine, or are we pointing at machines in general with a focus on the ones that are more similar to the exact one in our heads? In the extreme, a person who says “I care about what’s in my head” is an egoist who doesn’t care about other humans. Perhaps he would even be a short-term egoist who doesn’t care about his long-term future (since his brain will be more different by then). That’s one stance that some people take. But most of us try to generalize what we care about beyond our immediate selves. And then the question is how much to generalize.
It’s analogous to someone saying they love “that thing” and pointing at a piston steam engine. How much generality should we apply when saying what they value? Is it that particular piston steam engine? Piston steam engines in general? Engines in general? Mechanical devices in general with a focus on ones most like the particular piston steam engine being pointed to? It’s not clear, and people take widely divergent views here.
I think a similar fuzziness will apply when trying to decide for which entities “there’s something it’s like” to be those entities. There’s a wide range in possible views on how narrowly or broadly to interpret “something it’s like”.
I think those statements can apply to vanishing degrees. It’s usually not helpful to talk that way in ordinary life, but if we’re trying to have a full theory of repressing one’s emotions in general, I expect that one could draw some strained (or poetic, as you said) ways in which rocks are doing that. (Simple example: the chemical bonds in rocks are holding their atoms together, and without that the atoms of the rocks would move around more freely the way the atoms of a liquid or gas do.) IMO, the degree of applicability of the concept seems very low but not zero. This very low applicability is probably only going to matter in extreme situations, like if there are astronomical numbers of rocks compared with human-like minds.
I think this is a valid viewpoint, and I find it to be fairly similar to the one Luke Muehlhauser expressed in this dialogue. I sympathize with it quite a lot, but ultimately I part ways with it.
I suppose my main disagreement would probably boil down to a few things, including,
My intuition that conciousness is not easily classifiable in the same way a piston steam engine would be, even if you knew relatively little about how piston steam engines worked. I note that your viewpoint here seems similar to Eliezer’s analogy in Fake Causality. The difference, I imagine, is that consciousness doesn’t seem to be defined via a set of easily identifiable functional features. There is an extremely wide range of viewpoints about what constitutes a conscious experience, what properties consciousness has, and what people are even talking about when they use the word (even though it is sometimes said to be one of the “most basic” or “most elementary” concepts to us).
The question I care most about is not “how does consciousness work” but “what should I care about?” Progress on questions about “how X works” has historically yielded extremely crisp answers, explainable by models that use simple moving parts. I don’t think we’ve made substantial progress in answering the other question with simple, crisp models. One way of putting this is that if you came up to me with a well-validated, fundamental theory of consciousness (and somehow this was well defined), I might just respond, “That’s cool, but I care about things other than consciousness (as defined in that model).” It seems like the more you’re able to answer the question precisely and thoroughly, the more I’m probably going to disagree that the answer maps perfectly onto my intuitions about what I ought to care about.
The brain is a kludge, and doesn’t seem like the type of thing we should describe as a simple, coherent, unified engine. There are certainly many aspects of cognition that are very general, but most don’t seem like the type of thing I’d expect to be exclusively present in humans but not other animals. This touches on some disagreements I (perceive) I have with the foom perspective, but I think that even people from that camp would mostly agree with the weak version of this thesis.
I hadn’t seen that before! I love it, and I very much share Luke’s intuitions there (maybe no surprise, since I think his intuitions are stunningly good on both moral philosophy and consciousness). Thanks for the link. :)
Granted, but this seems true of a great many psychology concepts. Psychological concepts are generally poorly understood and very far from being formally defined, yet I’m confident we shouldn’t expect to find that rocks are a little bit repressing their emotions, or that cucumbers are kind of directing their attention at something, or that the sky’s relationship to the ground is an example of New Relationship Energy. ‘The sky is in NRE with the ground’ is doomed to always be a line of poetry, never a line of cognitive science.
(In some cases we’ve introduced new technical terms, like information-theoretic surprisal, that borrow psychological language. I think this is more common than successful attempts attempts to fully formalize/define how a high-level psychological phenomenon occurs in humans or other brains.)
I do expect some concept revision to occur as we improve our understanding of psychology. But I think our state is mostly ‘human psychology is really complicated, so we don’t understand it well yet’, not ‘we have empirically confirmed that human psychological attributes are continuous with the attributes of amoebas, rocks, etc.’.
My view is:
Our core, ultimate values are something we know very, very little about.
The true nature of consciousness is something we know almost nothing about.
Which particular computational processes are occurring in animal brains is something we know almost nothing about.
When you combine three blank areas of your map, the blank parts don’t cancel out. Instead, you get a part of your map that you should be even more uncertain about.
I don’t see a valid way to leverage that blankness-of-map to concentrate probability mass on ‘these three huge complicated mysterious brain-things are really similar to rocks, fungi, electrons, etc.’.
Rather, ‘moral value is a kludge’ and ‘consciousness is a kludge’ both make me update toward thinking the set of moral patients are smaller -- these engines don’t become less engine-y via being kludges, they just become more complicated and laden-with-arbitrary-structure.
A blank map of a huge complicated neural thingie enmeshed with verbal reasoning and a dozen other cognitive processes in intricate ways, is not the same as a filled-in map of something that’s low in detail and has very few crucial highly contingent or complex components. The lack of detail is in the map, but the territory can be extraordinarily detailed. And any of those details (either in our CEV, or in our consciousness) can turn out to be crucial in a way that’s currently invisible to us.
It sounds to me like you’re updating in the opposite direction—these things are kludges, therefore we should expect them (and their intersection, ‘things we morally value in a consciousness-style way’) to be simpler, more general, more universal, less laden with arbitrary hidden complexity. Why update in that direction?
Thinking about it more, my brain generates the following argument for the perspective I think you’re advocating:
Consciousness and human values are both complicated kludges, but they’re different complicated kludges, and they aren’t correlated (because evolution didn’t understand what ‘consciousness’ was when it built us, so it didn’t try to embed that entire complicated entity into our values, it just embedded various messy correlates that break down pretty easily).
It would therefore be surprising if any highly specific cognitive feature of humans ended up being core to our values. It’s less surprising if a simple (and therefore more widespread) cognitive thingie ends up important to our values, because although the totality of human values is very complex, a lot of the real-world things referred to by specific pieces of human value (e.g., ‘boo loud sudden noises’) are quite simple.
A lot of the complexity of values comes from the fact that it glues together an enormous list of many different relatively-simple things (orgasms, symmetry, lush green plants, the sound of birds chirping, the pleasure of winning a game), and then these need to interact in tons of complicated ways.
In some cases, there probably are much-more-complicated entities in our values. But any given specific complicated thing will be a lot harder to exactly locate in our values, because it’s less likely on priors that evolution will hand-code that thing into our brains, or hand-code a way for humans to reliably learn that value during development.
This argument moves me some, and maybe I’ll change my mind after chewing on it more.
I think the main reasons I don’t currently find it super compelling are:
1 - I think a lot of human values look like pointers to real-world phenomena, rather than encodings of real-world phenomena. Humans care about certain kinds of human-ish minds (which may or may not be limited to human beings). Rather than trying to hand-code a description of ‘mind that’s human-ish in the relevant way’, evolution builds in a long list of clues and correlates that let us locate the ‘human-ish mind’ object in the physical world, and glom on to that object. The full complexity of the consciousness-engine is likely to end up pretty central to our values by that method (even though not everything about that engine as it’s currently implemented in human brains is going to be essential—there are a lot of ways to build a piston steam engine).
I do think there will be a lot of surprises and weird edge cases in ‘the kind of mind we value’. But I think these are much more likely to arise if we build new minds that deliberately push toward the edges of our concept. I think it’s much less likely that we’ll care about chickens, rocks, or electrons because these pre-existing entities just happen to exploit a weird loophole in our empathy-ish values—most natural phenomena don’t have keys that are exactly the right shape to exploit a loophole in human values.
(I do think it’s not at all implausible that chickens could turn out to have ‘human-ish minds’ in the relevant sense. Maybe somewhere between 10% likely and 40% likely? But if chickens are moral patients according to our morality, I think it will be because it empirically turns out to be the case that ‘being conscious in the basic way humans are’ arose way earlier on the evolutionary tree, or arose multiple times on the tree, not because our brain’s moral ‘pointer toward human-ish minds’ is going haywire and triggering (to various degrees) in response to just about everything, in a way that our CEV deeply endorses.)
2 - In cases like this, I also don’t think humans care much about the pointers themselves, or the ‘experience of feeling as though something is human-like’—rather, humans care about whether the thing is actually human-like (in this particular not-yet-fully-understood way).
3 - Moral intuitions like fairness, compassion, respect-for-autonomy, punishment for misdeeds, etc. -- unlike values like ‘beauty’ or ‘disgust’—seem to me to all point at this poorly-understood notion of a ‘person’. We can list a ton of things that seem to be true of ‘people’, and we can wonder which of those things will turn out to be more or less central. We can wonder whether chickens will end up being ‘people-like’ in the ways that matter for compassion, even if we’re pretty sure they aren’t ‘people-like’ in the ways that matter for ‘punishment for misdeeds’.
But regardless, I think eventually (if we don’t kill ourselves first) we’re just going to figure out what these values (or reflectively endorsed versions of these values) are. And I don’t think eg ‘respect-for-autonomy’ is going to be a thing that smoothly increases from the electron level to the ‘full human brain’ level; I think it’s going to point at a particular (though perhaps large!) class of complicated engines.
I’m not actually sure if that’s the exact argument I had in mind while writing the part about kludges, but I do find it fairly compelling, especially the way you had written it. Thanks.
I apologize for not being a complete response here, but I think if I were to try to summarize a few lingering general disagreements, I would say,
“Human values” don’t seem to be primarily what I care about. I care about “my values” and I’m skeptical that “human values” will converge onto what I care about.
I have intuitions that ethics is a lot more arbitrary than you seem to think it is. Your argument is peppered with statements to the effect of what would our CEV endorse?. I do agree that some degree of self-reflection is good, but I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
You appear to have interpreted my intuitions about the arbitrariness of concepts as instead about the complexity and fragility of concepts, which you expressed in confusion. Note that I think this reflects a basic miscommunication on my part, not yours. I do have some intuitions about complexity, less about fragility; but my statements above were (supposed to be) more about arbitrariness (I think).
I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
I think ethics is totally arbitrary in the sense that it’s just ‘what people happened to evolve’, but I don’t think it’s that complex or heterogeneous from the perspective of a superintelligence. There’s a limit to how much load-bearing complexity a human brain can even fit.
I actually agree with this, and I suspect that we might not disagree as much as you think if we put “credences” on what we thought were conscious. I’d identify my view as somewhere between Luke’s view and Brian’s view, which takes into account Brian’s cosmopolitan perspective while insisting that consciousness is indeed a higher-level thing that doesn’t seem to be built into the universe.
The way I imagine any successful theory of consciousness going is that even if it has a long parts (processes) list, every feature on that list will apply pretty ubiquitously to at least a tiny degree. Even if the parts need to combine in certain ways, that could also happen to a tiny degree in basically everything, although I’m much less sure of this claim; I’m much more confident that I can find the parts in a lot of places than in the claim that basically everything is like each part, so finding the right combinations could be much harder. The full complexity of consciousness might still be found in basically everything, just to a usually negligible degree.
I’ve written more on this here.
I think this makes sense. However, and I don’t know whether I obfuscated this point somewhere, I don’t think I was arguing that we should be more certain about a particular theory. Indeed, from my perspective, I was arguing against reifying a single concept (self-reflectivity) as the thing that defines whether something is conscious, before we know anything about humans, much less whether humans are even capable of self-reflection in some discontinuous way from other animals.
I guess that when I said that brains are kludges, I was trying to say that their boundaries were fuzzy, rather than saying that they have well-defined boundaries but that the concept is extremely fragile, such that if you take away a single property from them they cease to be human. (I probably shouldn’t have used the term, and described it this way).
Complex structures like “tables” tend to be the type of thing that if you modify them across one or two dimensions, they belong to the same category. By contrast, a hydrogen atom is simple, and is the type of thing that if you take a property away from it, it ceases to be a hydrogen atom.
When I imagined a “consciousness engine” I visualized a simple system with clear moving parts, like a hydrogen atom. And conceptually, one of those moving parts could be a highly modular self-reflectivity component. Under this view, it might make a lot of sense that self-reflectivity is the defining component to a human, but I don’t suspect these things are actually that cleanly separable from the rest of the system.
In other words, it seems like the best model of a “table” or some other highly fuzzy concept, is not some extremely precise description of the exact properties that define a table, but rather some additive model in which each feature contributes some “tableness”, and such that no feature alone can either make something a table or prevent something from being a table. My intuitions about consciousness feel this way, but I’m not too certain about any of this.
I’d say my visualization of consciousness is less like a typical steam engine or table, and more like a Rube Goldberg machine designed by a very confused committee of terrible engineers. You can remove some parts of the machine without breaking anything, but a lot of other parts are necessary for the thing to work.
It should also be possible to design an AI that has ‘human-like consciousness’ via a much less kludge-ish process—I don’t think that much complexity is morally essential.
But chickens were built by a confused committee just like humans were, so they’ll have their own enormous intricate kludges (which may or may not be the same kind of machine as the Consciousness Machine in our heads), rather than having the really efficient small version of the consciousness-machine.
Note: I think there’s also a specific philosophical reason to think consciousness is pretty ubiquitous and fundamental—the hard problem of consciousness. The ‘we’re investing too much metaphysical importance into our pet obsession’ thing isn’t the only reason anyone thinks consciousness (or very-consciousness-ish things) might be ubiquitous.
But per illusionism, I think this philosophical reason turns out to be wrong in the end, leaving us without a principled reason to anthropomorphize / piston-steam-engine-omorphize the universe like that.
It’s true (on your view and mine) that there’s a pervasive introspective, quasi-perceptual illusion humans suffer about consciousness.
But the functional properties of consciousness (or of ‘the consciousness-like thing we actually have’) are all still there, behind the illusion.
Swapping from the illusory view to the almost-functionally-identical non-illusory view, I strongly expect, will not cause us to stop caring about the underlying real things (thoughts, and feelings, and memories, and love, and friendship).
And if we still care about those real things, then our utility function is still (I claim) pretty obsessed with some very specific and complicated engines/computations. (Indeed, a lot more specific and complicated than real-world piston steam engines.)
I’d expect it to mostly look more like how our orientation to water and oars changes when we realize that the oar reflected in the water isn’t really broken.
I don’t expect the revelation to cause humanity to replace its values with such vague values that we reshape our lives around slightly adjusting the spatial configurations of rocks or electrons, because our new ‘generalized friendship’ concept treats some common pebble configurations as more or less ‘friend-like’, more or less ‘asleep’, more or less ‘annoyed’, etc.
(Maybe we’ll do a little of that, for fun, as a sort of aesthetic project / a way of making the world feel more beautiful. But that gets us closer to my version of ‘generalizing human values to apply to unconscious stuff’, not Brian’s version.)
Putting aside my other disagreements for now (and I appreciate the other things you said), I’d like to note that I see my own view as “rescuing the utility function” far more than a view which asserts that non-human animals are largely unconscious automatons.
To the extent that learning to be a reductionist shouldn’t radically reshape what we care about, it seems clear to me that we shouldn’t stop caring about non-human animals, especially larger ones like pigs. I think most people, including the majority of people who eat meat regularly, think that animals are conscious. And I wouldn’t expect that having a dog or a cat personally would substantially negatively correlate with believing that animals are conscious (which would be weakly expected if we think our naive impressions track truth, and non-human animals aren’t conscious).
There have been quite a few surveys about this, though I’m not quickly coming up any good ones right now (besides perhaps this survey which found that 47% of people supported a ban on slaughterhouses, a result which was replicated, though it’s perhaps only about one third when you subtract those who don’t know what a slaughterhouse is).
This seems totally wrong to me.
I’m an illusionist, but that doesn’t mean I think that humans’ values are indifferent between the ‘entity with a point of view’ cluster in thingspace (e.g., typical adult humans), and the ‘entity with no point of view’ cluster in thingspace (e.g., braindead humans).
Just the opposite: I think there’s an overwhelmingly large and absolutely morally crucial difference between ‘automaton that acts sort of like it has morally relevant cognitive processes’ (say, a crude robot or a cartoon hand-designed to inspire people to anthropomorphize it), and ‘thing that actually has the morally relevant cognitive processes’.
It’s a wide-open empirical question whether, e.g., dogs are basically ‘automata that lack the morally relevant cognitive processes altogether’, versus ‘things with the morally relevant cognitive processes’. And I think ‘is there something it’s like to be that dog?’ is actually a totally fine intuition pump for imperfectly getting at the kind of difference that morally matters here, even though this concept starts to break when you put philosophical weight on it (because of the ‘hard problem’ lllusion) and needs to be replaced with a probably-highly-similar functional equivalent.
Like, the ‘is there something it’s like to be X?’ question is subject to an illusion in humans, and it’s a real messy folk concept that will surely need to be massively revised as we figure out what’s really going on. But it’s surely closer to asking the morally important question about dogs, compared to terrible, overwhelmingly morally unimportant questions like ‘can the external physical behaviors of this entity trick humans into anthropomorphizing the entity and feeling like it has a human-ish inner life’.
Tricking humans into anthropomorphizing things is so easy! What matters is what’s in the dog’s head!
Like, yes, when I say ‘the moral evaluation function takes the dog’s brain as an input, not the cuteness of its overt behaviors’, I am talking about a moral evaluation function that we have to extract from the human’s brain.
But the human moral evaluation function is a totally different function from the ‘does-this-thing-make-noises-and-facial-expressions-that-naturally-make-me-feel-sympathy-for-it-before-I-learn-any-neuroscience?’ function, even though both are located in the human brain.
Thinking (with very low confidence) about an idealized, heavily self-modified, reflectively consistent, CEV-ish version of me:
If it turns out that squirrels are totally unconscious automata, then I think Ideal Me would probably at least weakly prefer to not go around stepping on squirrels for fun. I think this would be for two reasons:
The kind of reverence-for-beauty that makes me not want to randomly shred flowers to pieces. Squirrels can be beautiful even if they have no moral value. Gorgeous sunsets plausibly deserve a similar kind of reverence.
The kind of disgust that makes me not want to draw pictures of mutilated humans. There may be nothing morally important about the cognitive algorithms in squirrels’ brains; but squirrels still have a lot of anatomical similarities to humans, and the visual resemblance between the two is reason enough to be grossed out by roadkill.
In both cases, these don’t seem like obviously bad values to me. (And I’m pretty conservative about getting rid of my values! Though a lot can and should change eventually, as humanity figures out all the risks and implications of various self-modifications. Indeed, I think the above descriptions would probably look totally wrong, quaint, and confused to a real CEV of mine; but it’s my best guess for now.)
In contrast, conflating the moral worth of genuinely-totally-conscious things (insofar as anything is genuinely conscious) with genuinely-totally-unconscious things seems… actively bad, to me? Not a value worth endorsing or protecting?
Like, maybe you think it’s implausible that squirrels, with all their behavioral complexity, could have ‘the lights be off’ in the way that a roomba with a cute face glued to it has ‘the lights off’. I disagree somewhat, but I find that view vastly less objectionable than ‘it doesn’t even matter what the squirrel’s mind is like, it just matters how uneducated humans naively emotionally respond to the squirrel’s overt behaviors’.
Maybe a way of gesturing at the thing is: Phenomenal consciousness is an illusion, but the illusion adds up to normality. It doesn’t add up to ‘therefore the difference between automata / cartoon characters and things-that-actually-have-the-relevant-mental-machinery-in-their-brains suddenly becomes unimportant (or even less important)’.
I feel like people on here are just picking apart each other’s arguments without really dealing with the main arguments present. A lot of times it’s not very clear what the focus is on anyways. I think he’s just referring to a different perspective he’s read regarding one way to look at things. Your example of obsession is only used to discredit the legitimacy of that perspective instead of actually adding value to the conversation.