Hi Martin, thanks a lot for reading and for your comment! I think what I was trying to express is actually quite similar to what you write here.
‘If we did they would still have different experiences, notably the experience of having a brain architecture ill-suited to operating their body.’ - I agree. If I understand shard theory right, it claims that underlying brain architecture doesn’t make much difference, and e.g. the experience of trying to walk in different ways, and failing at some but succeeding at others, would be enough to lead to success. However I’m pointing out that a dog’s brain would still be ill-suited to learning things such as walking in a human body (at least compared to a human’s brain), showing the importance of architecture.
My goal was to try to illustrate the importance of brain structure through an absurd thought experiment, not to create a coherent scenario—I’m sorry if that lead to confusion. The argument does not rest on the dog, the dog is meant to serve as an illustration of the argument.
At the end of the day, I think the authors of shard theory also concede that architecture is important in some cases—the difference seems to be more of a matter of scale. I’m merely suggesting that architecture may be a little more important than they consider it, and pointing to the variety of brain architectures and resulting values in different animals as an example.
Pointing to the variety of brains and values in animals doesn’t persuade me because they also have a wide variety of environments and experiences. Shard theory predicts a wide variety of values as a result (tempered by convergent evolution).
One distinctive prediction is in cases where the brain is the same but the experiences are different. You agree that “between humans, one person may value very different things from the next”. I agree and would point to humans throughout history who had very different experiences and values. I think the example of football vs reading understates the differences, which include slavery vs cannibalism.
The other distinctive prediction is in cases where the brain is different but the experiences are the same. So for example, consider humans who grow up unable to walk, either due to a different brain or due to a different body. Shard theory predicts similar values despite these different causes.
The shard theory claim here is as quoted, “value formation is … relatively architecture independent”. This is not a claim about skill formation, eg learning to walk. It’s also not a claim that architecture can never be causally upstream of values.
I see shard theory as a correction to Godshatter theory and its “thousand shards of desire”. Yudkowsky writes:
And so organisms evolve rewards for eating, and building nests, and scaring off competitors, and helping siblings, and discovering important truths, and forming strong alliances, and arguing persuasively, and of course having sex...
Arguing persuasively is a common human value, but shard theory claims that it’s not encoded into brain architecture. Instead it’s dependent on the experience of arguing persuasively and having that experience reinforced. This can be the common experience of a child persuading their parent to give them another cookie.
There is that basic dependence of having a reinforcement learning system that is triggered by fat and sugar. But it’s a long way from there to here.
Thanks for the response, Martin. I’d like to try to get to the heart of what we disagree on. Do you agree that, given a sufficiently different architecture—e.g. a human who had a dog’s brain implanted somehow—would grow to have different values in some respect? For example, you mention arguing persuasively. Argument is a pretty specific ability, but we can widen our field to language in general—the human brain has pretty specific circuitry for that. A dog’s brain that lacks the appropriate language centers would likely never learn to speak, leave alone argue persuasively.
I want to point out again that the disagreement is just a matter of scale. I do think relatively similar values can be learnt through similar experiences for basic RL agents; I just want to caution that for most human and animal examples, architecture may matter more than you might think.
Do you agree that, given a sufficiently different architecture—e.g. a human who had a dog’s brain implanted somehow—would grow to have different values in some respect?
Yes, according to shard theory. And thank you for switching to a more coherent hypothetical. Two reasons are different reinforcement signals and different capabilities.
Reinforcement signals
Brain → Reinforcement Signals → Reinforcement Events → Value Formation
The main way that brain architecture influences values, according to Quintin Pope and Alex Turner, is via reinforcement signals:
Assumption 3: The brain does reinforcement learning. According to this assumption, the brain has a genetically hard-coded reward system (implemented via certain hard-coded circuits in the brainstem and midbrain). In some[3] fashion, the brain reinforces thoughts and mental subroutines which have led to reward, so that they will be more likely to fire in similar contexts in the future. We suspect that the “base” reinforcement learning algorithm is relatively crude, but that people reliably bootstrap up to smarter credit assignment.
Dogs and humans have different ideal diets. Dogs have a better sense of smell than humans. Therefore it is likely that dogs and humans have different hard-coded reinforcement signals around the taste and smell of food. This is part of why dog treats smell and taste different to human treats. Human children often develop strong values around certain foods. Our hypothetical dog-human child would likely also develop strong values around food, but predictably different foods.
As human children grow up, they slowly develop values around healthy eating. Our hypothetical dog-human child would do the same. Because they have a human body, these healthy eating values would be more human-like.
Capabilities
Brain → Capabilities → Reinforcement Events → Value Formation
Let’s extend the hypothetic further. Suppose that we have a hybrid dog-human with a human body, human reinforcement learning signals, but otherwise a dog brain. Now does shard theory claim they will have the same values as a human?
No. While shard theory is based on the theory of “learning from scratch” in the brain, “Learning-from-scratch is NOT blank-slate”. So it’s reasonable to suppose, as you do, that the hybrid dog-human will have many cognitive weaknesses compared to humans, especially given its smaller cortex.
These cognitive weaknesses will naturally lead to different experiences, different reinforcement events. Shard theory claims that “reinforcement events shape human value shards”″. Accordingly shard theory predicts different values. Perhaps the dog-human never learns to read, and so it does not value reading.
On the other hand, once you learn an agent’s capabilities, this largely screens off its brain architecture. For example, to the extent that deafness influences values, the influence is the same for deafness caused by a brain defect and deafness caused by an ear defect. The dog-human would likely have similar values to a disabled human with similar cognitive abilities.
I discuss above that “architecture independent” is only true if you hold capabilities and reinforcement signals constant. The example given in the distillation is “a sufficiently large transformer and a sufficiently large conv net, given the same training data presented in the same order”.
I’m realizing that this distillation is potentially misleading when moved from the AI context to the context of natural intelligence.
Hi Martin, thanks a lot for reading and for your comment! I think what I was trying to express is actually quite similar to what you write here.
‘If we did they would still have different experiences, notably the experience of having a brain architecture ill-suited to operating their body.’ - I agree. If I understand shard theory right, it claims that underlying brain architecture doesn’t make much difference, and e.g. the experience of trying to walk in different ways, and failing at some but succeeding at others, would be enough to lead to success. However I’m pointing out that a dog’s brain would still be ill-suited to learning things such as walking in a human body (at least compared to a human’s brain), showing the importance of architecture.
My goal was to try to illustrate the importance of brain structure through an absurd thought experiment, not to create a coherent scenario—I’m sorry if that lead to confusion. The argument does not rest on the dog, the dog is meant to serve as an illustration of the argument.
At the end of the day, I think the authors of shard theory also concede that architecture is important in some cases—the difference seems to be more of a matter of scale. I’m merely suggesting that architecture may be a little more important than they consider it, and pointing to the variety of brain architectures and resulting values in different animals as an example.
Pointing to the variety of brains and values in animals doesn’t persuade me because they also have a wide variety of environments and experiences. Shard theory predicts a wide variety of values as a result (tempered by convergent evolution).
One distinctive prediction is in cases where the brain is the same but the experiences are different. You agree that “between humans, one person may value very different things from the next”. I agree and would point to humans throughout history who had very different experiences and values. I think the example of football vs reading understates the differences, which include slavery vs cannibalism.
The other distinctive prediction is in cases where the brain is different but the experiences are the same. So for example, consider humans who grow up unable to walk, either due to a different brain or due to a different body. Shard theory predicts similar values despite these different causes.
The shard theory claim here is as quoted, “value formation is … relatively architecture independent”. This is not a claim about skill formation, eg learning to walk. It’s also not a claim that architecture can never be causally upstream of values.
I see shard theory as a correction to Godshatter theory and its “thousand shards of desire”. Yudkowsky writes:
Arguing persuasively is a common human value, but shard theory claims that it’s not encoded into brain architecture. Instead it’s dependent on the experience of arguing persuasively and having that experience reinforced. This can be the common experience of a child persuading their parent to give them another cookie.
There is that basic dependence of having a reinforcement learning system that is triggered by fat and sugar. But it’s a long way from there to here.
Thanks for the response, Martin. I’d like to try to get to the heart of what we disagree on. Do you agree that, given a sufficiently different architecture—e.g. a human who had a dog’s brain implanted somehow—would grow to have different values in some respect? For example, you mention arguing persuasively. Argument is a pretty specific ability, but we can widen our field to language in general—the human brain has pretty specific circuitry for that. A dog’s brain that lacks the appropriate language centers would likely never learn to speak, leave alone argue persuasively.
I want to point out again that the disagreement is just a matter of scale. I do think relatively similar values can be learnt through similar experiences for basic RL agents; I just want to caution that for most human and animal examples, architecture may matter more than you might think.
Thanks for trying to get to the heart of it.
Yes, according to shard theory. And thank you for switching to a more coherent hypothetical. Two reasons are different reinforcement signals and different capabilities.
Reinforcement signals
Brain → Reinforcement Signals → Reinforcement Events → Value Formation
The main way that brain architecture influences values, according to Quintin Pope and Alex Turner, is via reinforcement signals:
Dogs and humans have different ideal diets. Dogs have a better sense of smell than humans. Therefore it is likely that dogs and humans have different hard-coded reinforcement signals around the taste and smell of food. This is part of why dog treats smell and taste different to human treats. Human children often develop strong values around certain foods. Our hypothetical dog-human child would likely also develop strong values around food, but predictably different foods.
As human children grow up, they slowly develop values around healthy eating. Our hypothetical dog-human child would do the same. Because they have a human body, these healthy eating values would be more human-like.
Capabilities
Brain → Capabilities → Reinforcement Events → Value Formation
Let’s extend the hypothetic further. Suppose that we have a hybrid dog-human with a human body, human reinforcement learning signals, but otherwise a dog brain. Now does shard theory claim they will have the same values as a human?
No. While shard theory is based on the theory of “learning from scratch” in the brain, “Learning-from-scratch is NOT blank-slate”. So it’s reasonable to suppose, as you do, that the hybrid dog-human will have many cognitive weaknesses compared to humans, especially given its smaller cortex.
These cognitive weaknesses will naturally lead to different experiences, different reinforcement events. Shard theory claims that “reinforcement events shape human value shards”″. Accordingly shard theory predicts different values. Perhaps the dog-human never learns to read, and so it does not value reading.
On the other hand, once you learn an agent’s capabilities, this largely screens off its brain architecture. For example, to the extent that deafness influences values, the influence is the same for deafness caused by a brain defect and deafness caused by an ear defect. The dog-human would likely have similar values to a disabled human with similar cognitive abilities.
Where is the disagreement?
My current guess is that you agree with most of the above, but disagree with the distilled claim that “value formation is very path dependent and architecture independent”.
I think the problem is that this is very “distilled”. Pope and Trout claim that some human values are convergent: “We think that many biases are convergently produced artifacts of the human learning process & environment”. In other words, these values end up being path independent.
I discuss above that “architecture independent” is only true if you hold capabilities and reinforcement signals constant. The example given in the distillation is “a sufficiently large transformer and a sufficiently large conv net, given the same training data presented in the same order”.
I’m realizing that this distillation is potentially misleading when moved from the AI context to the context of natural intelligence.