I wonder what you think of my post “Learning from scratch” in the Brain? I feel like the shard theory discussions you cite were significantly based off that post of mine (I hope I’m not tooting my own horn—someone can correct me if I’m mis-describing the intellectual history here). If so, I think there’s a game of telephone, and things are maybe getting lost in translation.
For what it’s worth, I find this an odd post because I’m quite familiar with twin studies, I find them compelling, and I frequently bring them up in conversation. (They didn’t come up in that particular post but I did briefly mention them indirectly here in a kinda weird context.)
Heritability for behavioral traits tends to increase, not decrease, during lifespan development
If we think about the human brain (loosely) as doing model-based reinforcement learning, and if different people have different genetically-determined reward functions, then one might expect that people with very similar reward functions tend to find their way into similar ways of being / relating / living / thinking / etc.—namely, the ways of being that best tickle their innate reward function. But that process might take time.
For example, if Alice has an innate reward function that predisposes her to be sympathetic to the idea of authoritarianism [important open question that I’m working on: exactly wtf kind of reward function might do that??], but Alice has spent her sheltered childhood having never been exposed to pro-authoritarian arguments, well then she’s not going to be a pro-authoritarian child! But by adulthood, she will have met lots of people, read lots of things, lived in different places, etc., so she’s much more likely to have come across pro-authoritarian arguments, and those arguments would have really resonated with her, thanks to her genetically-determined reward function.
So I find the increase in heritability with age to be unsurprising.
Recall that Assumption 1 of Shard Theory was that “The cortex is basically (locally) randomly initialized.” Recent studies in neurogenetics show that this is not accurate. Genetically informative studies in the Human Connectome Project show pervasive heritability in neural structure and function across all brain areas, not just limbic areas.
The word “locally” in the first sentence is doing a lot of work here. Again see Section 2.3.1. AFAICT, the large-scale wiring diagram of the cortex is mostly or entirely innate, as are the various cytoarchitectural differences across the cortex (agranularity etc.). I think of this fact as roughly “a person’s genome sets them up with a bias to learn particular types of patterns, in particular parts of their cortex”. But they still have to learn those patterns, with a learning algorithm (I claim).
As an ML analogy, there’s a lot of large-scale structure in a randomly-initialized convolutional neural net. Layer N is connected to Layer N+1 but not Layer N+17, and nearby pixels are related by convolutions in a way that distant pixels are not, etc. But a randomly-initialized convolutional neural net is still “learning from scratch” by my definition.
Shard Theory implies that genes shape human brains mostly before birth
I don’t speak for Shard Theory but I for one strongly believe that the innate reward function is different during different stages of life, as a result of development (not learning), e.g. sex drive goes up in puberty.
FWIW, if memory serves, I have no complaints about that book outside of chapter 5. I have lots of complaints about chapter 5, as you might expect.
Pope and Turner include this bold statement: “Human values are not e.g. an incredibly complicated, genetically hard-coded set of drives, but rather sets of contextually activated heuristics which were shaped by and bootstrapped from crude, genetically hard-coded reward circuitry.”
My understanding (which I have repeatedly complained about) is that they are using “values” where I would use the word “desires”.
In that context, consider the statement: “I like Ghirardelli chocolate, and I like the Charles River, and I like my neighbor Dani, and I like [insert 5000 more things like that].” I think it’s perfectly obvious to everyone that there are not 5000 specific genes for my liking these 5000 particular things. There are probably (IMO) specific genes that contribute to why I like chocolate (i.e. genes related to taste), and genes that contribute to why I like the Charles River (i.e. genes related to my sense of aesthetics), etc. And there are also life experiences involved here; if I had had my first kiss at the Charles River, I would probably like it a bit more. Right?
I’m not exactly sure what the word “crude” is doing here. I don’t think I would have used that word. I think the hard-coded reward circuitry is rather complex and intricate in its own way. But it’s not as complex and intricate as our learned desires! I think describing our genetically hard-coded reward circuitry would take like maybe thousands of lines of pseudocode, whereas describing everything that an adult human likes / desires would take maybe millions or billions of lines. After all, we only have 25,000ish genes, but the brain has 100 trillion(ish) synapses.
“it seems intractable for the genome to scan a human brain and back out the “death” abstraction, which probably will not form at a predictable neural address. Therefore, we infer that the genome can’t directly make us afraid of death by e.g. specifying circuitry which detects when we think about death and then makes us afraid. In turn, this implies that there are a lot of values and biases which the genome cannot hardcode.”
I’m not sure I would have said it that way—see here if you want to see me trying to articulate (what I think is) this same point that Quintin was trying to get across there.
Let’s consider Death as an abstract concept in your conscious awareness. When this abstract concept is invoked, it probably centrally involves particular neurons in your temporal lobe, and various other neurons in various other places. Which temporal-lobe neurons? It’s probably different neurons for different people. Sure, most people will have this Death concept in the same part of their temporal lobes, maybe even the same at a millimeter scale or something. But down to the individual neurons? Seems unlikely, right? After all, different people have pretty different conceptions of death. Some cultures might have two death concepts instead of one, for all I know. Some people (especially kids) don’t know that death is a thing in the first place, and therefore don’t have a concept for it at all. When the kid finally learns it, it’s going to get stored somewhere (really, multiple places), but I don’t think the destination in the temporal lobe is predetermined down to the level of individual neurons.
So, consider the possible story: “The abstract concept of Death is going to deterministically involve temporal lobe neurons 89642, and 976387, and (etc.) The genome will have a program that wires those particular neurons to the ventral-anterior & medial hypothalamus and PAG. And therefore, humans will be hardwired to be afraid of death.”
That’s an implausible story, right? As it happens, I don’t think humans are genetically hardwired to be afraid of death in the first place. But even if they were, I don’t think the mechanism could look like that.
That doesn’t necessarily mean there’s no possible mechanism by which humans could have a genetic disposition to be specifically afraid of death. It would just have to work in a more indirect way, presumably (IMO) involving learning algorithms in some way.
Shard Theory incorporates a relatively Blank Slate view about the origins of human values
I’m trying to think about how you wound up with this belief in the first place. Here’s a guess. I could be wrong.
One thing is, insofar as human learning is describable as model-based RL (yes it’s an oversimplification but I do think it’s a good starting point), the reward function is playing a huge role.
And in the context of AGI alignment, we the programmers get to design the reward function however we want.
We can even give ourselves a reward button, and press whenever we feel like it.
If we are magically perfectly skillful with the reward function / button, e.g. we have magical perfect interpretability and give reward whenever the AGI’s innermost thoughts and plans line up with what we want it to be thinking and planning, then I think we would eventually get an aligned AGI.
A point that shard theory posts sometimes bring up is, once we get to this point, and the AGI is also super smart and capable, it’s at least plausible that we can just give the reward button to the AGI, or give the AGI write access to its reward function. Thanks to instrumental convergence goal-preservation drive, the AGI would try to use that newfound power for good, to make sure that it stayed aligned (and by assumption of super-competence, it would succeed).
Whether we buy that argument or not, I think maybe it can be misinterpreted as a blank slate-ish argument! After all, it involves saying that “early in life” we brainwash the AGI with our reward button, and “late in life” (after we’ve completely aligned it and then granted it access to its own reward function), the AGI will continue to adhere to the desires with which it was brainwashed as a child.
But you can see the enormous disanalogies with humans, right? Human parents are hampered in their ability to brainwash their children by not having direct access to their kids’ reward centers, and not having interpretability of their kids’ deepest thoughts which would be necessary to make good use of that anyway. Likewise, human adults are hampered in their ability to prevent their own values from drifting by not having write access to their own brainstems etc., and generally they aren’t even trying to prevent their own values from drifting anyway, and they wouldn’t know how to even if they could in principle.
(Again, maybe this is all totally unrelated to how you got the impression that Shard Theory is blank slate-ist, in which case you can ignore it!)
Steven—thanks very much for your long, thoughtful, and constructive comment. I really appreciate it, and it does help to clear up a few of my puzzlements about Shard Theory (but not all of them!).
Let me ruminate on your comment, and read your linked essays.
I have been thinking about how evolution can implement different kinds of neural architectures, with different degrees of specificity versus generality, ever since my first paper in 1989 on using genetic algorithms to evolve neural networks. Our 1994 paper on using genetic algorithms to evolve sensorimotor control systems for autonomous robots used a much more complex mapping from genotype to neural phenotype.
So, I think there are lots of open questions about exactly how much of our neural complexity is really ‘hard wired’ (a term I loathe). But my hunch is that a lot of our reward circuitry that tracks key ‘fitness affordances’ in the environment is relatively resistant to manipulation by environmental information—not least, because other individuals would take advantage of any ways that they could rewire what we really want.
If we think about the human brain (loosely) as doing model-based reinforcement learning, and if different people have different genetically-determined reward functions,
To agree and expand on this—successful DL systems have a bunch of important hyperparameters, and many of these control balances between different learning sub-objectives and priors/regularizers. Any DL systems that use intrinsic motivation, and especially those that combine that with other paradigms like extrinsic reward reinforcement learning, tend to have a bunch of these hyperparams. The brain is driven by both complex instrinsic learning mechanisms (empowerment/curiosity, predictive learning, etc) and extrinsic reward reinforcement learning (pleasure, pain, hunger, thirst, sleep, etc), and so likely has many such hyperparams. The brain also seems to control learning schedules somewhat adaptively (which again is also important for SOTA DL systems) - and even perhaps per module to some extent (as brain regions tend to crystalize/myleninate in hierarchical processing order, starting with lower sensor/motor cortex and ending in upper cortex and PFC), which introduces even more hyperparams.
So absent other explanations, it seems pretty likely that humans vary across these hyperparms, which can have enormous effects on later development. High curiosity drive combined with delayed puberty/neotany (with adapted learning rate schedules) is already a simple sufficient explanation for much of the variation in STEM-type abstract intelligence, and more specifically explains the ‘jock vs nerd’ or phenomena as different stable early vs late mating strategy niches.
As it happens, I don’t think humans are genetically hardwired to be afraid of death in the first place
Yeah pretty sure they aren’t (or at least I wasn’t, had to learn). But since death is a minimally empowered state, its immediately obviously evaluated as very low utility.
I wonder what you think of my post “Learning from scratch” in the Brain? I feel like the shard theory discussions you cite were significantly based off that post of mine (I hope I’m not tooting my own horn—someone can correct me if I’m mis-describing the intellectual history here). If so, I think there’s a game of telephone, and things are maybe getting lost in translation.
For what it’s worth, I find this an odd post because I’m quite familiar with twin studies, I find them compelling, and I frequently bring them up in conversation. (They didn’t come up in that particular post but I did briefly mention them indirectly here in a kinda weird context.)
See in particular:
Section 2.3.1: Learning-from-scratch is NOT “blank slate”;
Section 2.3.2: Learning from scratch is NOT “nurture-over-nature”.
Onto more specific things:
If we think about the human brain (loosely) as doing model-based reinforcement learning, and if different people have different genetically-determined reward functions, then one might expect that people with very similar reward functions tend to find their way into similar ways of being / relating / living / thinking / etc.—namely, the ways of being that best tickle their innate reward function. But that process might take time.
For example, if Alice has an innate reward function that predisposes her to be sympathetic to the idea of authoritarianism [important open question that I’m working on: exactly wtf kind of reward function might do that??], but Alice has spent her sheltered childhood having never been exposed to pro-authoritarian arguments, well then she’s not going to be a pro-authoritarian child! But by adulthood, she will have met lots of people, read lots of things, lived in different places, etc., so she’s much more likely to have come across pro-authoritarian arguments, and those arguments would have really resonated with her, thanks to her genetically-determined reward function.
So I find the increase in heritability with age to be unsurprising.
The word “locally” in the first sentence is doing a lot of work here. Again see Section 2.3.1. AFAICT, the large-scale wiring diagram of the cortex is mostly or entirely innate, as are the various cytoarchitectural differences across the cortex (agranularity etc.). I think of this fact as roughly “a person’s genome sets them up with a bias to learn particular types of patterns, in particular parts of their cortex”. But they still have to learn those patterns, with a learning algorithm (I claim).
As an ML analogy, there’s a lot of large-scale structure in a randomly-initialized convolutional neural net. Layer N is connected to Layer N+1 but not Layer N+17, and nearby pixels are related by convolutions in a way that distant pixels are not, etc. But a randomly-initialized convolutional neural net is still “learning from scratch” by my definition.
I don’t speak for Shard Theory but I for one strongly believe that the innate reward function is different during different stages of life, as a result of development (not learning), e.g. sex drive goes up in puberty.
FWIW, if memory serves, I have no complaints about that book outside of chapter 5. I have lots of complaints about chapter 5, as you might expect.
My understanding (which I have repeatedly complained about) is that they are using “values” where I would use the word “desires”.
In that context, consider the statement: “I like Ghirardelli chocolate, and I like the Charles River, and I like my neighbor Dani, and I like [insert 5000 more things like that].” I think it’s perfectly obvious to everyone that there are not 5000 specific genes for my liking these 5000 particular things. There are probably (IMO) specific genes that contribute to why I like chocolate (i.e. genes related to taste), and genes that contribute to why I like the Charles River (i.e. genes related to my sense of aesthetics), etc. And there are also life experiences involved here; if I had had my first kiss at the Charles River, I would probably like it a bit more. Right?
I’m not exactly sure what the word “crude” is doing here. I don’t think I would have used that word. I think the hard-coded reward circuitry is rather complex and intricate in its own way. But it’s not as complex and intricate as our learned desires! I think describing our genetically hard-coded reward circuitry would take like maybe thousands of lines of pseudocode, whereas describing everything that an adult human likes / desires would take maybe millions or billions of lines. After all, we only have 25,000ish genes, but the brain has 100 trillion(ish) synapses.
I’m not sure I would have said it that way—see here if you want to see me trying to articulate (what I think is) this same point that Quintin was trying to get across there.
Let’s consider Death as an abstract concept in your conscious awareness. When this abstract concept is invoked, it probably centrally involves particular neurons in your temporal lobe, and various other neurons in various other places. Which temporal-lobe neurons? It’s probably different neurons for different people. Sure, most people will have this Death concept in the same part of their temporal lobes, maybe even the same at a millimeter scale or something. But down to the individual neurons? Seems unlikely, right? After all, different people have pretty different conceptions of death. Some cultures might have two death concepts instead of one, for all I know. Some people (especially kids) don’t know that death is a thing in the first place, and therefore don’t have a concept for it at all. When the kid finally learns it, it’s going to get stored somewhere (really, multiple places), but I don’t think the destination in the temporal lobe is predetermined down to the level of individual neurons.
So, consider the possible story: “The abstract concept of Death is going to deterministically involve temporal lobe neurons 89642, and 976387, and (etc.) The genome will have a program that wires those particular neurons to the ventral-anterior & medial hypothalamus and PAG. And therefore, humans will be hardwired to be afraid of death.”
That’s an implausible story, right? As it happens, I don’t think humans are genetically hardwired to be afraid of death in the first place. But even if they were, I don’t think the mechanism could look like that.
That doesn’t necessarily mean there’s no possible mechanism by which humans could have a genetic disposition to be specifically afraid of death. It would just have to work in a more indirect way, presumably (IMO) involving learning algorithms in some way.
I’m trying to think about how you wound up with this belief in the first place. Here’s a guess. I could be wrong.
One thing is, insofar as human learning is describable as model-based RL (yes it’s an oversimplification but I do think it’s a good starting point), the reward function is playing a huge role.
And in the context of AGI alignment, we the programmers get to design the reward function however we want.
We can even give ourselves a reward button, and press whenever we feel like it.
If we are magically perfectly skillful with the reward function / button, e.g. we have magical perfect interpretability and give reward whenever the AGI’s innermost thoughts and plans line up with what we want it to be thinking and planning, then I think we would eventually get an aligned AGI.
A point that shard theory posts sometimes bring up is, once we get to this point, and the AGI is also super smart and capable, it’s at least plausible that we can just give the reward button to the AGI, or give the AGI write access to its reward function. Thanks to instrumental convergence goal-preservation drive, the AGI would try to use that newfound power for good, to make sure that it stayed aligned (and by assumption of super-competence, it would succeed).
Whether we buy that argument or not, I think maybe it can be misinterpreted as a blank slate-ish argument! After all, it involves saying that “early in life” we brainwash the AGI with our reward button, and “late in life” (after we’ve completely aligned it and then granted it access to its own reward function), the AGI will continue to adhere to the desires with which it was brainwashed as a child.
But you can see the enormous disanalogies with humans, right? Human parents are hampered in their ability to brainwash their children by not having direct access to their kids’ reward centers, and not having interpretability of their kids’ deepest thoughts which would be necessary to make good use of that anyway. Likewise, human adults are hampered in their ability to prevent their own values from drifting by not having write access to their own brainstems etc., and generally they aren’t even trying to prevent their own values from drifting anyway, and they wouldn’t know how to even if they could in principle.
(Again, maybe this is all totally unrelated to how you got the impression that Shard Theory is blank slate-ist, in which case you can ignore it!)
Steven—thanks very much for your long, thoughtful, and constructive comment. I really appreciate it, and it does help to clear up a few of my puzzlements about Shard Theory (but not all of them!).
Let me ruminate on your comment, and read your linked essays.
I have been thinking about how evolution can implement different kinds of neural architectures, with different degrees of specificity versus generality, ever since my first paper in 1989 on using genetic algorithms to evolve neural networks. Our 1994 paper on using genetic algorithms to evolve sensorimotor control systems for autonomous robots used a much more complex mapping from genotype to neural phenotype.
So, I think there are lots of open questions about exactly how much of our neural complexity is really ‘hard wired’ (a term I loathe). But my hunch is that a lot of our reward circuitry that tracks key ‘fitness affordances’ in the environment is relatively resistant to manipulation by environmental information—not least, because other individuals would take advantage of any ways that they could rewire what we really want.
To agree and expand on this—successful DL systems have a bunch of important hyperparameters, and many of these control balances between different learning sub-objectives and priors/regularizers. Any DL systems that use intrinsic motivation, and especially those that combine that with other paradigms like extrinsic reward reinforcement learning, tend to have a bunch of these hyperparams. The brain is driven by both complex instrinsic learning mechanisms (empowerment/curiosity, predictive learning, etc) and extrinsic reward reinforcement learning (pleasure, pain, hunger, thirst, sleep, etc), and so likely has many such hyperparams. The brain also seems to control learning schedules somewhat adaptively (which again is also important for SOTA DL systems) - and even perhaps per module to some extent (as brain regions tend to crystalize/myleninate in hierarchical processing order, starting with lower sensor/motor cortex and ending in upper cortex and PFC), which introduces even more hyperparams.
So absent other explanations, it seems pretty likely that humans vary across these hyperparms, which can have enormous effects on later development. High curiosity drive combined with delayed puberty/neotany (with adapted learning rate schedules) is already a simple sufficient explanation for much of the variation in STEM-type abstract intelligence, and more specifically explains the ‘jock vs nerd’ or phenomena as different stable early vs late mating strategy niches.
Yeah pretty sure they aren’t (or at least I wasn’t, had to learn). But since death is a minimally empowered state, its immediately obviously evaluated as very low utility.
Related, I think: I just posted the post Heritability, Behaviorism, and Within-Lifetime RL