The first correction I’d offer is that I wouldn’t present this as purely springing from Quintin and Alex—“Shard Theory” is mostly a sexy name for Steve Byrnes’ picture of drives in the brain, together with some suggestive language identifying “shards” as agents, and some claims about how it applies to AI alignment.
I think you have a bit of a misunderstanding of the claims, mostly caused by mixing up Steve’s picture of drives with typical blank-slatism, which can be nicely illustrated by talking about this section:
This assumption is motivated by an argument explored here that ‘human values and biases are inaccessible to the genome’. For example, Quintin Trout argues “it seems intractable for the genome to scan a human brain and back out the “death” abstraction, which probably will not form at a predictable neural address. Therefore, we infer that the genome can’t directly make us afraid of death by e.g. specifying circuitry which detects when we think about death and then makes us afraid. In turn, this implies that there are a lot of values and biases which the genome cannot hardcode.”
This Shard Theory argument seems to reflect a fundamental misunderstanding of how evolution shapes genomes to produce phenotypic traits and complex adaptations. The genome never needs to ‘scan’ an adaptation and figure out how to reverse-engineer it back into genes. The genetic variants simply build a slightly new phenotypic variant of an adaptation, and if it works better than existing variants, then the genes that built it will tend to propagate through the population. The flow of design information is always from genes to phenotypes, even if the flow of selection pressures is back from phenotypes to genes. This one-way flow of information from DNA to RNA to proteins to adaptations has been called the ‘Central Dogma of molecular biology’, and it still holds largely true (the recent hype about epigenetics notwithstanding).
Shard Theory implies that biology has no mechanism to ‘scan’ the design of fully-mature, complex adaptations back into the genome, and therefore there’s no way for the genome to code for fully-mature, complex adaptations. If we take that argument at face value, then there’s no mechanism for the genome to ‘scan’ the design of a human spine, heart, hormone, antibody, cochlea, or retina, and there would be no way for evolution or genes to influence the design of the human body, physiology, or sensory organs. Evolution would grind to a halt – not just at the level of human values, but at the level of all complex adaptations in all species that have ever evolved.
I think this section takes a Quintin Trout [sic because it’s funny] quote about developement and then provides a counterargument as if it was a claim about evolution. Of course evolution can select on whether you avoid dying in the abstract, if that’s important to your fitness. Quintin Trout would agree with that just fine.
The point they’re trying to make, the question of interest here, is what evolution is actually doing to your development when it selects for you avoiding dying in the abstract? For certain features of your brain like the blink reflex, it seems like evolution “memorizes” a developmental recipe for a certain neural circuit in the brainstem, and the reflex is heritable because the recipe works just as well on your children. But for a motivation to avoid dying in the abstract, it seems like the development of this feature is a much more complicated recipe that involves lots of interaction with the environment—babies are born with a blink reflex despite not needing it in the womb, but they’re not born with a fear or dying in the abstract (desire for dollar bills might be an even better example).
So the question is, what is the developmental recipe that leads to the heritability of certain abstract drives?
The first correction I’d offer is that I wouldn’t present this as purely springing from Quintin and Alex—“Shard Theory” is mostly a sexy name for Steve Byrnes’ picture of drives in the brain, together with some suggestive language identifying “shards” as agents, and some claims about how it applies to AI alignment.
Definitely agree it’s not purely springing from us. Shard theor inherits an enormous amount from Steve’s picture of the brain, and from a generous number of private communications with Steve this spring. I don’t know of neuroscience I’m in disagreement with Steve about (and he knows a lot more than I do, as well). I perceive us to have similar neuroscientific assumptions. But I’d add two clarifications:
Shard theory has substantially different emphases (multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation, terminalization of instrumental “activation-level” values into “weight-level” shards of their own). I also consider myself to be taking speculation to different places (it seems like shard theory should apply more generally to AI systems as well, and anything which satisfies its assumptions, mod second-order effects of inductive biases).
we think that shards are not discrete subagents with their own world models and mental workspaces. We currently estimate that most shards are “optimizers” to the extent that a bacterium or a thermostat is an optimizer.
Quintin—yes, indeed, one of the reasons I was excited about Shard Theory was that it has these different emphases you mention (e.g. ‘multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation’), which I thought might actually be useful to develop and integrate with in evolutionary psychology and other branches of psychology, not just in AI alignment.
So I wanted to see if Shard Theory could be made a little more consistent with behavior genetics and ev psych theories and findings, so it could have more impact in those fields. (Both fields can get a little prickly about people ignoring their theories and findings, since they’ve been demonized for ideological reasons since the 1970s and 1990s, respectively).
Indeed, you might find quite a few similarities and analogies between certain elements of Shard Theory and certain traditional notions in evolutionary psychology, such as domain-specificity, adaptive hypocrisy and adaptive self-deception, internal conflicts between different adaptive strategies, satisficing of fitness proxies as instrumental convergent goals rather than attempting to maximize fitness itself as a terminal value, etc. Shard Theory can potentially offer some new perspectives on those traditional concepts, in the light of modern reinforcement learning theory in machine learning.
Charlie—thanks for offering a little more ‘origin story’ insight into Shard Theory, and for trying to explain what Quintin Trout was trying to express in that passage.
Honestly, I still don’t get it. The ‘developmental recipe’ that maps from genotype to phenotype, for any complex adaptation, is usually opaque, complicated, uninterpretable, and full of complex feedback loops, regulatory systems, and quality control systems. These are typically beyond all human comprehension, because there were never any evolutionary selection pressures for that developmental recipe to be interpretable to human scientists. Thousands of genes and genomic regulatory elements interact through hundreds or thousands of developmental pathways to construct even the simplest morphological adaptations, such as a finger.
The fact that we find it hard to imagine a genome coding for an abstract fear of death is no argument at all against a genome being able to code for that—any more than our failure to understand how genomes could code for human hands, or adaptive immune systems, or mate preferences, would be compelling arguments against genomes being able to code for those things.
This all just seems like what Richard Dawkins called an ‘argument from failure of imagination’.
But, I might still be misunderstanding what Shard Theory is driving at here.
This all just seems like what Richard Dawkins called an ‘argument from failure of imagination’.
I’m saying “Either the genome can hardcode death fear, which would have huge alignment implications, or it can’t, which would have huge alignment implications, or it can hardcode death fear but only via advantages it had but we won’t, which doesn’t have huge implications.” Of the three, I think the second is most likely—that you can’t just read off an a priori unknown data structure, and figure out where death is computed inside of that. If there were less hard and complex ways to get an organism to fear death, I expect evolution to have found those instead.
See also clues from ethology, where even juicy candidates-for-hardcoding ended up not being hardcoded.
TurnTrout—I think the ‘either/or’ framing here is misleading about the way that genomes can adapt to maximize survival and minimize death.
For example, jumping spiders have evolved special secondary eyes pointing backwards that specifically detect predators from behind that might want to eat them. At the functional level of minimizing death, these eyes ‘hardcode death-fear’ in a very real and morphological way. Similarly, many animals vulnerable to predators evolve eye locations on the sides of their heads, to maximize degrees of visual coverage they can see. Prey animals also evolve pupils adapted to scanning the horizon for predators, i.e. for death-risks; the morphology of their visual systems itself ‘encodes’ fear of death from predators.
More generally, any complex adaptations that humans have evolved to avoid starvation, infection, predation, aggression, etc can be analyzed as ‘encoding a fear of death’, and can be analyzed functionally in terms of risk sensitivity, loss aversion, Bayesian priors about the most dangerous organisms and events in the environment, etc. There are thousands of papers in animal behavior that do this kind of functional analysis—including in anti-predator strategies, anti-pathogen defenses, evolutionary immunology, optimal foraging theory, food choice, intrasexual aggression, etc. This stuff is the bread and butter of behavioral biology.
So, if this strategy of evolutionary-functional analysis of death-avoidance adaptations has worked so well in thousands of other species, I don’t see why it should be considered ‘impossible in principle’ for humans, based on some theoretical arguments about how genomes can’t read off neural locations for ‘death-detecting cells’ from the adult brain.
The key point, again, is that genomes never need to ‘read off’ details of adult neural circuitry; they just need to orchestrate brain development—in conjunction with ancestrally typical, cross-generationally recurring features of their environments—that will reliably result in psychological adaptations that represent important life values and solve important life problems.
I don’t see why it should be considered ‘impossible in principle’ for humans, based on some theoretical arguments about how genomes can’t read off neural locations for ‘death-detecting cells’ from the adult brain.
People are indeed effectively optimized by evolution to do behavior X in situation Y (e.g. be afraid when death seems probable). I think evolution did that a lot. I think people are quite optimized by evolution in the usual behavioral biological ways you described.
I’m rather saying that the genome can’t e.g. specify a neural circuit which fires if and only if a person is thinking about death. I’m saying that most biases are probably not explicit adaptations, that evolution cannot directly select for certain high-level cognitive properties (and only those properties), eg “level of risk aversion” or “behavior follows discounting scheme X” or “vulnerability to the framing effect.” But evolution absolutely can and did select genotypes to unfold into minds which tend to be shaped in the form “cares more about ingroup.”
I think we’re coming at this issue from different angles—I’m taking a very evolutionary-functional view focused on what selection pressures shape psychological adaptations, what environmental information those adaptations need to track (e.g. snake! or pathogen!), what they need to represent about the world (e.g. imminent danger of death from threat X!), and what behaviors they need to trigger (e.g. run away!).
From that evolutionary-functional view, the ‘high-level cognitive properties’ of ‘fitness affordances’ are the main things that matter to evolved agents, and the lower-level details of what genes are involved, what specific neural circuits are needed, or what specific sensory inputs are relevant, just don’t matter very much—as long as there’s some way for evolution to shape the relevant psychological adaptations.
And the fact that animals do reliably evolve to track the key fitness affordances in their environments (e.g. predators, prey, mates, offspring, kin, herds, dangers) suggests that the specifics of neurogenetic development don’t in fact impose much of a constraint on psychological evolution.
It seems like you’re coming at the issue from more of a mechanistic, bottom-up perspective that focuses on the mapping from genes to neural circuits. Which is fine, and can be helpful. But I would just be very wary about using neurogenetic arguments to make overly strong claims about what evolution can or can’t do in terms of crafting complex psychological adaptations.
Seems like we broadly agree on most points here, AFAICT. Thanks again for your engagement. :)
the fact that animals do reliably evolve to track the key fitness affordances in their environments (e.g. predators, prey, mates, offspring, kin, herds, dangers) suggests that the specifics of neurogenetic development don’t in fact impose much of a constraint on psychological evolution.
This evidence shows that evolution is somehow able to adapt to relevant affordances, but doesn’t (to my eye) discriminate strongly between the influence being mediated by selection on high-level cognitive properties.
For example, how strongly do these observations discriminate between worlds where evolution was or wasn’t constrained by having or not having the ability to directly select adaptations over high-level cognitive properties (like “afraid of death in the abstract”)? Would we notice the difference between those worlds? What amount of affordance-tailoring would we expect in worlds where evolution was able to perform such selection, compared to worlds where it wasn’t?
It seems to me that we wouldn’t notice the difference. There are many dimensions of affordance-tailoring, and it’s harder to see affordances that weren’t successfully selected for.
For a totally made up and naive but illustrative example, if adult frogs reliably generalize to model that a certain kind of undercurrent is dangerous (ie leads to predicted-death), but that undercurrent doesn’t leave sensory-definable signs, evolution might not have been able to select frogs to avoid that particular kind of undercurrent, even though the frogs model the undercurrent in their world model. If the undercurrent decreases fitness by enough, perhaps frogs are selected to be averse towards necessary conditions for waters having those undercurrents—maybe those are sensory-definable (or otherwise definable in terms of eg cortisol predictions).
But we might just see a frog which is selected for a huge range of other affordances, and not consider that evolution failed with the undercurrent-affordance. (The important point here doesn’t have to do with frogs, and I expect it to stand even if the example is biologically naive.)
The fact that we find it hard to imagine a genome coding for an abstract fear of death
The genome doesn’t code fear of death for humans, but it doesn’t need to. Humans learn the concept of death through cultural transmission, and it is immediately terrifying because our primary drive (the instrumental convergent drive) is empowerment, and death is the minimally empowered state.
Jacob, I’m having trouble reconciling your view of brains as ‘Universal Learning Machines’ (and almost everything being culturally transmitted), with the fact that millions of other animals species show exactly the kinds of domain-specific adaptive responses studied in evolutionary biology, animal behavior research, and evolutionary psychology.
Why would ‘fear of death’ be ‘culturally transmitted’ in humans, when thousands of other vertebrate species show many complex psychological and physiological adaptations to avoid accidents, starvation, parasitism, and predation that tends to result in death, including intense cortisol and adrenalin responses that are associated with fear of death?
When we talk about adaptations that embody a ‘fear of death’, we’re not talking about some conscious, culturally transmitted, conceptual understanding of death; we’re talking about the brain and body systems that actually help animals avoid death.
My essay on embodied values might be relevant on this point.
When we talk about adaptations that embody a ‘fear of death’, we’re not talking about some conscious, culturally transmitted, conceptual understanding of death;
That is in fact what I was talking about, because the abstract conscious culturally transmitted fear of death is vastly more general and effective once learned. Humans do seem to have innate fears of some leading causes of early death, such as heights, and indirect fear of many sources of contamination through disgust; there are probably a few other examples.
But in general humans have lost many innate skills and responses (which typically come from brainstem CPGs) in favor of the more complex learned variants (cortical) - we even must learn to walk. Human babies are notoriously lacking in fear of all the various ways the world can kill them and require extensive supervision.
This is pretty interesting.
The first correction I’d offer is that I wouldn’t present this as purely springing from Quintin and Alex—“Shard Theory” is mostly a sexy name for Steve Byrnes’ picture of drives in the brain, together with some suggestive language identifying “shards” as agents, and some claims about how it applies to AI alignment.
I think you have a bit of a misunderstanding of the claims, mostly caused by mixing up Steve’s picture of drives with typical blank-slatism, which can be nicely illustrated by talking about this section:
I think this section takes a Quintin Trout [sic because it’s funny] quote about developement and then provides a counterargument as if it was a claim about evolution. Of course evolution can select on whether you avoid dying in the abstract, if that’s important to your fitness. Quintin Trout would agree with that just fine.
The point they’re trying to make, the question of interest here, is what evolution is actually doing to your development when it selects for you avoiding dying in the abstract? For certain features of your brain like the blink reflex, it seems like evolution “memorizes” a developmental recipe for a certain neural circuit in the brainstem, and the reflex is heritable because the recipe works just as well on your children. But for a motivation to avoid dying in the abstract, it seems like the development of this feature is a much more complicated recipe that involves lots of interaction with the environment—babies are born with a blink reflex despite not needing it in the womb, but they’re not born with a fear or dying in the abstract (desire for dollar bills might be an even better example).
So the question is, what is the developmental recipe that leads to the heritability of certain abstract drives?
Definitely agree it’s not purely springing from us. Shard theor inherits an enormous amount from Steve’s picture of the brain, and from a generous number of private communications with Steve this spring. I don’t know of neuroscience I’m in disagreement with Steve about (and he knows a lot more than I do, as well). I perceive us to have similar neuroscientific assumptions. But I’d add two clarifications:
Shard theory has substantially different emphases (multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation, terminalization of instrumental “activation-level” values into “weight-level” shards of their own). I also consider myself to be taking speculation to different places (it seems like shard theory should apply more generally to AI systems as well, and anything which satisfies its assumptions, mod second-order effects of inductive biases).
I don’t view shards as agents. As I wrote in the main essay:
Quintin—yes, indeed, one of the reasons I was excited about Shard Theory was that it has these different emphases you mention (e.g. ‘multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation’), which I thought might actually be useful to develop and integrate with in evolutionary psychology and other branches of psychology, not just in AI alignment.
So I wanted to see if Shard Theory could be made a little more consistent with behavior genetics and ev psych theories and findings, so it could have more impact in those fields. (Both fields can get a little prickly about people ignoring their theories and findings, since they’ve been demonized for ideological reasons since the 1970s and 1990s, respectively).
Indeed, you might find quite a few similarities and analogies between certain elements of Shard Theory and certain traditional notions in evolutionary psychology, such as domain-specificity, adaptive hypocrisy and adaptive self-deception, internal conflicts between different adaptive strategies, satisficing of fitness proxies as instrumental convergent goals rather than attempting to maximize fitness itself as a terminal value, etc. Shard Theory can potentially offer some new perspectives on those traditional concepts, in the light of modern reinforcement learning theory in machine learning.
Charlie—thanks for offering a little more ‘origin story’ insight into Shard Theory, and for trying to explain what Quintin Trout was trying to express in that passage.
Honestly, I still don’t get it. The ‘developmental recipe’ that maps from genotype to phenotype, for any complex adaptation, is usually opaque, complicated, uninterpretable, and full of complex feedback loops, regulatory systems, and quality control systems. These are typically beyond all human comprehension, because there were never any evolutionary selection pressures for that developmental recipe to be interpretable to human scientists. Thousands of genes and genomic regulatory elements interact through hundreds or thousands of developmental pathways to construct even the simplest morphological adaptations, such as a finger.
The fact that we find it hard to imagine a genome coding for an abstract fear of death is no argument at all against a genome being able to code for that—any more than our failure to understand how genomes could code for human hands, or adaptive immune systems, or mate preferences, would be compelling arguments against genomes being able to code for those things.
This all just seems like what Richard Dawkins called an ‘argument from failure of imagination’.
But, I might still be misunderstanding what Shard Theory is driving at here.
I’m saying “Either the genome can hardcode death fear, which would have huge alignment implications, or it can’t, which would have huge alignment implications, or it can hardcode death fear but only via advantages it had but we won’t, which doesn’t have huge implications.” Of the three, I think the second is most likely—that you can’t just read off an a priori unknown data structure, and figure out where death is computed inside of that. If there were less hard and complex ways to get an organism to fear death, I expect evolution to have found those instead.
See also clues from ethology, where even juicy candidates-for-hardcoding ended up not being hardcoded.
TurnTrout—I think the ‘either/or’ framing here is misleading about the way that genomes can adapt to maximize survival and minimize death.
For example, jumping spiders have evolved special secondary eyes pointing backwards that specifically detect predators from behind that might want to eat them. At the functional level of minimizing death, these eyes ‘hardcode death-fear’ in a very real and morphological way. Similarly, many animals vulnerable to predators evolve eye locations on the sides of their heads, to maximize degrees of visual coverage they can see. Prey animals also evolve pupils adapted to scanning the horizon for predators, i.e. for death-risks; the morphology of their visual systems itself ‘encodes’ fear of death from predators.
More generally, any complex adaptations that humans have evolved to avoid starvation, infection, predation, aggression, etc can be analyzed as ‘encoding a fear of death’, and can be analyzed functionally in terms of risk sensitivity, loss aversion, Bayesian priors about the most dangerous organisms and events in the environment, etc. There are thousands of papers in animal behavior that do this kind of functional analysis—including in anti-predator strategies, anti-pathogen defenses, evolutionary immunology, optimal foraging theory, food choice, intrasexual aggression, etc. This stuff is the bread and butter of behavioral biology.
So, if this strategy of evolutionary-functional analysis of death-avoidance adaptations has worked so well in thousands of other species, I don’t see why it should be considered ‘impossible in principle’ for humans, based on some theoretical arguments about how genomes can’t read off neural locations for ‘death-detecting cells’ from the adult brain.
The key point, again, is that genomes never need to ‘read off’ details of adult neural circuitry; they just need to orchestrate brain development—in conjunction with ancestrally typical, cross-generationally recurring features of their environments—that will reliably result in psychological adaptations that represent important life values and solve important life problems.
People are indeed effectively optimized by evolution to do behavior X in situation Y (e.g. be afraid when death seems probable). I think evolution did that a lot. I think people are quite optimized by evolution in the usual behavioral biological ways you described.
I’m rather saying that the genome can’t e.g. specify a neural circuit which fires if and only if a person is thinking about death. I’m saying that most biases are probably not explicit adaptations, that evolution cannot directly select for certain high-level cognitive properties (and only those properties), eg “level of risk aversion” or “behavior follows discounting scheme X” or “vulnerability to the framing effect.” But evolution absolutely can and did select genotypes to unfold into minds which tend to be shaped in the form “cares more about ingroup.”
Hopefully this comment clarifies my views some?
That’s somewhat helpful.
I think we’re coming at this issue from different angles—I’m taking a very evolutionary-functional view focused on what selection pressures shape psychological adaptations, what environmental information those adaptations need to track (e.g. snake! or pathogen!), what they need to represent about the world (e.g. imminent danger of death from threat X!), and what behaviors they need to trigger (e.g. run away!).
From that evolutionary-functional view, the ‘high-level cognitive properties’ of ‘fitness affordances’ are the main things that matter to evolved agents, and the lower-level details of what genes are involved, what specific neural circuits are needed, or what specific sensory inputs are relevant, just don’t matter very much—as long as there’s some way for evolution to shape the relevant psychological adaptations.
And the fact that animals do reliably evolve to track the key fitness affordances in their environments (e.g. predators, prey, mates, offspring, kin, herds, dangers) suggests that the specifics of neurogenetic development don’t in fact impose much of a constraint on psychological evolution.
It seems like you’re coming at the issue from more of a mechanistic, bottom-up perspective that focuses on the mapping from genes to neural circuits. Which is fine, and can be helpful. But I would just be very wary about using neurogenetic arguments to make overly strong claims about what evolution can or can’t do in terms of crafting complex psychological adaptations.
Seems like we broadly agree on most points here, AFAICT. Thanks again for your engagement. :)
This evidence shows that evolution is somehow able to adapt to relevant affordances, but doesn’t (to my eye) discriminate strongly between the influence being mediated by selection on high-level cognitive properties.
For example, how strongly do these observations discriminate between worlds where evolution was or wasn’t constrained by having or not having the ability to directly select adaptations over high-level cognitive properties (like “afraid of death in the abstract”)? Would we notice the difference between those worlds? What amount of affordance-tailoring would we expect in worlds where evolution was able to perform such selection, compared to worlds where it wasn’t?
It seems to me that we wouldn’t notice the difference. There are many dimensions of affordance-tailoring, and it’s harder to see affordances that weren’t successfully selected for.
For a totally made up and naive but illustrative example, if adult frogs reliably generalize to model that a certain kind of undercurrent is dangerous (ie leads to predicted-death), but that undercurrent doesn’t leave sensory-definable signs, evolution might not have been able to select frogs to avoid that particular kind of undercurrent, even though the frogs model the undercurrent in their world model. If the undercurrent decreases fitness by enough, perhaps frogs are selected to be averse towards necessary conditions for waters having those undercurrents—maybe those are sensory-definable (or otherwise definable in terms of eg cortisol predictions).
But we might just see a frog which is selected for a huge range of other affordances, and not consider that evolution failed with the undercurrent-affordance. (The important point here doesn’t have to do with frogs, and I expect it to stand even if the example is biologically naive.)
The genome doesn’t code fear of death for humans, but it doesn’t need to. Humans learn the concept of death through cultural transmission, and it is immediately terrifying because our primary drive (the instrumental convergent drive) is empowerment, and death is the minimally empowered state.
Jacob, I’m having trouble reconciling your view of brains as ‘Universal Learning Machines’ (and almost everything being culturally transmitted), with the fact that millions of other animals species show exactly the kinds of domain-specific adaptive responses studied in evolutionary biology, animal behavior research, and evolutionary psychology.
Why would ‘fear of death’ be ‘culturally transmitted’ in humans, when thousands of other vertebrate species show many complex psychological and physiological adaptations to avoid accidents, starvation, parasitism, and predation that tends to result in death, including intense cortisol and adrenalin responses that are associated with fear of death?
When we talk about adaptations that embody a ‘fear of death’, we’re not talking about some conscious, culturally transmitted, conceptual understanding of death; we’re talking about the brain and body systems that actually help animals avoid death.
My essay on embodied values might be relevant on this point.
That is in fact what I was talking about, because the abstract conscious culturally transmitted fear of death is vastly more general and effective once learned. Humans do seem to have innate fears of some leading causes of early death, such as heights, and indirect fear of many sources of contamination through disgust; there are probably a few other examples.
But in general humans have lost many innate skills and responses (which typically come from brainstem CPGs) in favor of the more complex learned variants (cortical) - we even must learn to walk. Human babies are notoriously lacking in fear of all the various ways the world can kill them and require extensive supervision.