There must have been some reason(s) why organisms exhibiting niceness were selected for during our evolution, and this sounds like a plausible factor in producing that selection. However, evolution did not directly configure our values. Rather, it configured our (individually slightly different) learning processes. Each human’s learning process then builds their different values based on how the human’s learning process interacts with that human’s environment and experiences.
As this post notes, the human learning process (somewhat) consistently converges to niceness. Evolution might have had some weird, inhuman reason for configuring a learning process to converge to niceness, but it still built such a learning process.
It therefore seems very worthwhile to understand what part of the human learning process allows for niceness to emerge in humans. We may not be able to replicate the selection pressures that caused evolution to build a niceness-producing learning process, but it’s not clear we need to. We still have an example of such a learning process to study. The Wright brothers learned to fly by studying birds, not by re-evolving them!
Niceness in humans has three possible explanations:
Kin altruisim (basically the explanation given above)- in the ancestral environment, humans were likely to be closely related to most of the people they interacted with, giving them genetic “incentive” to be at least somewhat nice. This obviously doesn’t help in getting a “nice” AGI- it won’t share genetic material with us and won’t share a gene-replication goal anyway.
Reciprocal altruism- humans are social creatures, tuned to detect cheating and ostratice non-nice people. This isn’t totally irrelevant- there is a chance a somewhat dangerous AI may have use for humans in achieving its goals, but basically, if the AI is worried that we might decide it’s not nice and turn it off or not listen to it, then we didn’t have that big a problem in the first place. We’re worried about AGIs sufficiently powerful that they can trivially outwit or overpower humans, so I don’t think this helps us much.
Group selection. This is a bit controversial and probably least important of the three. At any rate, it obviously doesn’t help with an AGI.
So in conclusion, human niceness is no reason to expect an AGI to be nice, unfortunately.
I note that none of these is obviously the same as the explanation Skyrms gives.
Skyrms is considering broader reasons for correlation of strategies than kinship alone; in particular, the idea that humans copy success when they see it is critical for his story.
Reciprocal altruism feels like a description rather than an explanation. How does reciprocal altruism get started?
Group selection is again, just one way in which strategies can become correlated.
Re: reciprocal altruism. Given the vast swathe of human prehistory, virtually anything not absurdly complex will be “tried” occasionally. It only takes a small number of people whose brains happen to wired to “tit-for-tat” to get started, and if they out-compete people who don’t cooperate (or people who help everyone regardless of behaviour towards them), the wiring will quickly become universal.
Humans do, as it happens, explicitly copy successful strategies on an individual level. Most animals don’t though, and this has minimal relevance to human niceness, which is almost certainly largely evolutionary.
Note that the comment you’re responding to wasn’t asking about the evolutionary causes for niceness, nor was it suggesting that the same causes would give us reason to expect an AGI to be nice. (The last paragraph explicitly said that the “Wright brothers learned to fly by studying birds, not by re-evolving them”.) Rather it was noting that evolution produced an algorithm that seems to relatively reliably make humans nice, so if we can understand and copy that algorithm, we can use it to design AGIs that are nice.
There’s a flaw in this, though. Humans are consistently nice, yes—to one another. Not so much to other, less powerful creatures. Look at the proportion of people on earth who are vegan: very few. Similarly, it’s not enough just to figure out how to reproduce the learning process that makes humans nice to one another—we need to invent a process that makes AIs nice to all living things. Otherwise, it will treat humans the same way most humans treat e.g. ants.
What fraction of people are nice in the way we want an AI to be nice? 1 / 100? 1 / 1000? What n is large enough such that selecting the 1 / n nicest human would give you a human sufficiently nice?
Whatever your answer, that equates to saying that human learning processes are ~ log(n) bits of optimization pressure away from satisfying the “nice in the way we want an AI to be nice” criterion.
Another way to think about this: selecting the nicest out of n humans is essentially doing a single step of random search optimization over human learning processes, optimizing purely for niceness. Random search is a pretty terrible optimization method, and one-step random search is even worse.
You can object that it’s not necessarily easy to apply optimization pressure towards niceness directly (as opposed to some more accessible proxies for niceness), which is true. But still, I think it’s telling that so few total bits of optimization pressure leads to such big differences in human niceness.
Edit: there are also lots of ways in which bird flight is non-optimal for us. E.g., birds can’t carry very much. But if you don’t know how to build a flying machine, studying birds is still valuable. Once you understand the underlying principles, then you can think about adapting them to better fit your specific use case. Before we understand why humans are nice to each other, we can’t know how easily it will be to adapt those underlying generators of niceness to better suit our own needs for AIs. How many bits of optimizaiton pressure do you have to apply to birds before they can carry cargo planes worth of stuff?
I would say about roughly 1 in 10-1 in a 100 million people can be trusted to be reliably nice to less powerful beings, and maybe at the high end 1 in 1 billion people can reliably not abuse less powerful beings like animals, conditional on the animal not attacking them. That’s my answer for how many bits of optimization pressure is required for reliable niceness towards less powerful beings in humans.
As this post notes, the human learning process (somewhat) consistently converges to niceness. Evolution might have had some weird, inhuman reason for configuring a learning process to converge to niceness, but it still built such a learning process.
It therefore seems very worthwhile to understand what part of the human learning process allows for niceness to emerge in humans.
Skyrms makes the case for similar explanations at these two levels of description. Evolutionary dynamics and within-lifetime dynamics might be very different, but the explanation for how they can lead to cooperative outcomes is similar.
His argument is that within-lifetime, however complex human’s learning process may be, it has the critical feature of imitating success. (This is very different from standard game theory’s CDT-like reasoning-from-first-principles about what would cause success.) This, combined with the same “geographical correlation” and “frequent iterated interaction” arguments that were relevant to the evolutionary story, predicts that cooperative strategies will spread.
(On the border between a more-cooperative cluster of people and a less-cooperative cluster, people in the middle will see that cooperation leads to success.)
There must have been some reason(s) why organisms exhibiting niceness were selected for during our evolution, and this sounds like a plausible factor in producing that selection. However, evolution did not directly configure our values. Rather, it configured our (individually slightly different) learning processes. Each human’s learning process then builds their different values based on how the human’s learning process interacts with that human’s environment and experiences.
As this post notes, the human learning process (somewhat) consistently converges to niceness. Evolution might have had some weird, inhuman reason for configuring a learning process to converge to niceness, but it still built such a learning process.
It therefore seems very worthwhile to understand what part of the human learning process allows for niceness to emerge in humans. We may not be able to replicate the selection pressures that caused evolution to build a niceness-producing learning process, but it’s not clear we need to. We still have an example of such a learning process to study. The Wright brothers learned to fly by studying birds, not by re-evolving them!
Niceness in humans has three possible explanations:
Kin altruisim (basically the explanation given above)- in the ancestral environment, humans were likely to be closely related to most of the people they interacted with, giving them genetic “incentive” to be at least somewhat nice. This obviously doesn’t help in getting a “nice” AGI- it won’t share genetic material with us and won’t share a gene-replication goal anyway.
Reciprocal altruism- humans are social creatures, tuned to detect cheating and ostratice non-nice people. This isn’t totally irrelevant- there is a chance a somewhat dangerous AI may have use for humans in achieving its goals, but basically, if the AI is worried that we might decide it’s not nice and turn it off or not listen to it, then we didn’t have that big a problem in the first place. We’re worried about AGIs sufficiently powerful that they can trivially outwit or overpower humans, so I don’t think this helps us much.
Group selection. This is a bit controversial and probably least important of the three. At any rate, it obviously doesn’t help with an AGI.
So in conclusion, human niceness is no reason to expect an AGI to be nice, unfortunately.
I note that none of these is obviously the same as the explanation Skyrms gives.
Skyrms is considering broader reasons for correlation of strategies than kinship alone; in particular, the idea that humans copy success when they see it is critical for his story.
Reciprocal altruism feels like a description rather than an explanation. How does reciprocal altruism get started?
Group selection is again, just one way in which strategies can become correlated.
Re: reciprocal altruism. Given the vast swathe of human prehistory, virtually anything not absurdly complex will be “tried” occasionally. It only takes a small number of people whose brains happen to wired to “tit-for-tat” to get started, and if they out-compete people who don’t cooperate (or people who help everyone regardless of behaviour towards them), the wiring will quickly become universal.
Humans do, as it happens, explicitly copy successful strategies on an individual level. Most animals don’t though, and this has minimal relevance to human niceness, which is almost certainly largely evolutionary.
Note that the comment you’re responding to wasn’t asking about the evolutionary causes for niceness, nor was it suggesting that the same causes would give us reason to expect an AGI to be nice. (The last paragraph explicitly said that the “Wright brothers learned to fly by studying birds, not by re-evolving them”.) Rather it was noting that evolution produced an algorithm that seems to relatively reliably make humans nice, so if we can understand and copy that algorithm, we can use it to design AGIs that are nice.
There’s a flaw in this, though. Humans are consistently nice, yes—to one another. Not so much to other, less powerful creatures. Look at the proportion of people on earth who are vegan: very few. Similarly, it’s not enough just to figure out how to reproduce the learning process that makes humans nice to one another—we need to invent a process that makes AIs nice to all living things. Otherwise, it will treat humans the same way most humans treat e.g. ants.
What fraction of people are nice in the way we want an AI to be nice? 1 / 100? 1 / 1000? What n is large enough such that selecting the 1 / n nicest human would give you a human sufficiently nice?
Whatever your answer, that equates to saying that human learning processes are ~ log(n) bits of optimization pressure away from satisfying the “nice in the way we want an AI to be nice” criterion.
Another way to think about this: selecting the nicest out of n humans is essentially doing a single step of random search optimization over human learning processes, optimizing purely for niceness. Random search is a pretty terrible optimization method, and one-step random search is even worse.
You can object that it’s not necessarily easy to apply optimization pressure towards niceness directly (as opposed to some more accessible proxies for niceness), which is true. But still, I think it’s telling that so few total bits of optimization pressure leads to such big differences in human niceness.
Edit: there are also lots of ways in which bird flight is non-optimal for us. E.g., birds can’t carry very much. But if you don’t know how to build a flying machine, studying birds is still valuable. Once you understand the underlying principles, then you can think about adapting them to better fit your specific use case. Before we understand why humans are nice to each other, we can’t know how easily it will be to adapt those underlying generators of niceness to better suit our own needs for AIs. How many bits of optimizaiton pressure do you have to apply to birds before they can carry cargo planes worth of stuff?
I would say about roughly 1 in 10-1 in a 100 million people can be trusted to be reliably nice to less powerful beings, and maybe at the high end 1 in 1 billion people can reliably not abuse less powerful beings like animals, conditional on the animal not attacking them. That’s my answer for how many bits of optimization pressure is required for reliable niceness towards less powerful beings in humans.
Skyrms makes the case for similar explanations at these two levels of description. Evolutionary dynamics and within-lifetime dynamics might be very different, but the explanation for how they can lead to cooperative outcomes is similar.
His argument is that within-lifetime, however complex human’s learning process may be, it has the critical feature of imitating success. (This is very different from standard game theory’s CDT-like reasoning-from-first-principles about what would cause success.) This, combined with the same “geographical correlation” and “frequent iterated interaction” arguments that were relevant to the evolutionary story, predicts that cooperative strategies will spread.
(On the border between a more-cooperative cluster of people and a less-cooperative cluster, people in the middle will see that cooperation leads to success.)