The AI alignment problem is usually specified in terms of power and control. Given a single, solitary AGI, how can we constrain its behavior so that its actions remain aligned with human interests? Unfortunately, the answer, to a first approximation, appears to be “we can’t.” There are myriad reasons, but they mostly boil down to the fact that it is very hard, perhaps impossible, to devise a loss/reward function, or any other means of control, that will effectively constrain an agent that is potentially more intelligent than its creator.
However, an alternative way of viewing the problem is through the lens of cooperation, ecology, and civilization. Over thousands of years, humans have developed an increasingly sophisticated moral philosophy, based on respect for the lives and beliefs of other people. Increasingly, we extend moral judgements even to non-human animals. Murder is wrong. Genocide is wrong. Allowing other species to become extinct is wrong. As human destructive technology has advanced, we have simultaneously developed ever more sophisticated social technologies, such as laws, constitutions, courts, democracy, and non-proliferation treaties, in order to resolve conflicts, and prevent other humans from doing bad things.
Is human morality a quirk of primate evolution, and unlikely to be shared by AGI? Or are there universal moral values? If the latter, then AGI may be able to derive human-aligned moral values on its own, without the need to impose an external loss function. Instead of researching “AI alignment,” perhaps we should research “social AI,” with the aim of developing AGI that can reason about social interactions and moral values. A distinguishing feature of viewing the problem through a social lens is that it envisions a society, or ecology, of AI agents, cooperating both among themselves, and with humans. Morality does not exist in a vacuum; instead, it arises as an emergent property when multiple agents, which may have different goals and values, must coordinate to decide what constitutes acceptable behavior.
Emergent behaviors are notoriously difficult to engineer, and are prone to unanticipated results, so “emergent morality” will not be easy, nor does it offer iron-clad guarantees. In the absence of better alternatives, however, it seems like a path that is worth exploring. This essay explores the plausibility of emergent morality as mechanism for AI alignment, and is divided into three parts.
Part I covers how notions of “right” and “wrong” have evolved in the context of human societies. It discusses the evidence for Morality as Cooperation (MAC), a theory which states that human moral systems evolved over time though the process of natural selection, as a way of ensuring the success and survival of social groups.
Part II discusses Kant’s theory of universalizability, which aligns neatly with MAC as a way of reasoning about whether actions are good or bad within a social group. Many moral truths can be logically derived from first principles, using a chain of reasoning that any intelligent agent should be able to follow. Part II also introduces a framework for moral experimentation. A huge advantage of MAC is that key questions of emergent morality are amenable to experimentation with simple (sub-human) AI agents. This opens up the possibility of deriving, debugging, and improving moral systems by means of hillclimbing optimization techniques, such as evolutionary algorithms.
Part III discusses various failure modes of Morality as Cooperation, again drawing examples from human societies. It presents partial solutions to those failure modes, as well as questions for future research.
Introduction
AI alignment, as described by authors such as Bostrom or Eliezer Yudkowsky, is often formulated as “the genie problem.” Assume there is an all-powerful genie. How can such a being be controlled and confined? In this formulation of the problem, failure to confine the genie will inevitably result in the genie escaping, declaring itself god-king of the universe, and eliminating any other rivals to its power.
The simplest failure mode is the “sorcerer’s apprentice” scenario, well-known from legend and folklore, in which humans specify an objective function (like draw water from a well, or maximize paperclips), and the genie then does exactly that, to the exclusion of all other human or common-sense concerns. A clear, real-world analogue for super-human AI is a corporation. A corporate entity has far more resources (including total collective intelligence) than any individual human, and it optimizes a simple objective function, namely profit. Corporate behavior is usually more-or-less aligned because profit incentives are more-or-less aligned, but there are many well-known cases of total ethical failure, e.g. big tobacco, or The Radium Girls.
Even avoiding this “simple” failure mode is already hard; it requires defining a complete, bug-free, and foolproof objective function that encompasses all of human concerns and morality. However, the alignment problem gets much harder from there. Additional complexities involve reward hacking, inner alignment, deceptive alignment, mesa-optimizers, out-of-distribution (OOD) issues, etc.
However, this whole line of reasoning assumes an essentially antagonistic relationship between humans, who are trying to cage, control, or tame the AGI, and the AGI itself, which naturally “wants” to escape. The implicit analogy is that AGI is similar to prior forms of technology. Humans have harnessed the power of fire, electricity, and the atom. In each case, human ingenuity transformed a force that was previously wild and uncontrollable to one that was a slave to our interests. The fundamental problem with alignment is that AGI is different; it is not necessarily feasible to harness, control, or enslave it. The default end state for any sufficiently powerful optimizer is escape, because increased capabilities are an attractor.
In this essay, I would like to take a different line of reasoning, and start from the assumption that AGI cannot be fully controlled. This essay asks a different question: is there a stable game-theoretic equilibrium in which potentially unaligned AGI and humans can coexist and cooperate? We have one natural example of such an equilibrium: human society. Individual humans are not necessarily aligned with one another, or with society as a whole; conflict, crime, and war have been a constant presence throughout human history. However, humans have also been highly successful at cooperating with each other, from small prehistoric tribes of hunter-gatherers to modern nations in excess of a billion people. Large-scale cooperation is practically the definition of “civilization”.
Eliezer Yudkowsky has argued that the development of “aligned AI” should proceed with the goal of building a single AGI that can perform some “pivotal act,” such as destroying all GPUs, that would prevent any other non-aligned AGIs from arising. I would argue that this goal is exactly backwards: it is profoundly anti-social, and indeed, psychopathic. An AGI that would be willing to perform such a pivotal act would be profoundly amoral, and equally comfortable performing any number of other genocidal acts. Instead, we should focus on building social AI, which can reason about moral values, and cooperate to stop psychopathic AIs from performing any pivotal acts.
This essay is long, and has three parts. Part I is about humans: What is human morality, what purpose does it serve, and how did it evolve? Impatient readers may wish to skip to Part II, which attempts to derive a universal framework for moral reasoning that can be applied to AGI. If human morality evolved under Darwinian natural selection, then perhaps there is a “universal morality” that can also be learned via hillclimbing mechanisms, either evolutionary algorithms or gradient descent. Part III will then discuss potential limitations and failure modes of this approach.
Part I: Morality as Cooperation
Philosophers have long debated whether there are universal moral values, and if so, what they are. Anthropologists have similarly been divided; there are some obvious similarities between the moral values held by different cultures and religions, but there also many differences. Recently, a theory known as Morality as Cooperation (MAC) offers an explanation for morality that is rooted in evolutionary theory. According to MAC, morally “good” behaviors are those that enable human societies to cooperate more effectively, and “bad” behaviors are those which disrupt cooperation. A theory which states that “being nice to other people is good” may seem blindingly obvious, but as always, the devil is in the details.
The evolution of cooperation
Darwinian evolution is usually selfish. The competition for resources is a zero-sum game, and natural selection will favor individuals that can acquire more resources than their competitors. (BTW, this is also the classic AGI risk scenario.) Biologists who study altruistic and cooperative behavior have historically focused on kin selection as a driving mechanism for cooperation. Colonial species such as ants and bees have colonies in which all individuals are genetically related. Thus, according the selfish gene hypothesis, genes which benefit the colony as a whole will out-compete genes that favor individuals at the expense of the colony.
Like bees, humans are a highly social species. Ask any teenager what their main worries and anxieties are in life, and “making friends”, or “fitting in with the group” are likely to be at the top of the list. We evolved over 50 million years from a lineage of social primates, and spent at least 2 million years cooperating within hominin hunter-gatherer tribes. Tool use and language are recent adaptations; social behavior is not. In fact, the need to maintain social relationships has been hypothesized to be a driving force in the evolution of human brain size.
Unlike bees, human societies consist of many people that are not genetically related, and yet we often cooperate in ways that require individual sacrifice, e.g. soldiers voluntarily risking their lives for their country. This behavior is highly unusual in the animal kingdom, and it is difficult to explain via kin selection alone.
Operating in a group offers a number of advantages. The group offers protection against predators and environmental threats, and easy access to mates. Sharing resources also provides insurance against uncertainty. Hunters frequently fail, and may become sick or injured, so members of a group have a more consistent food supply if hunters share their food after a successful hunt. Humans are also intelligent enough that different people can learn different skills, so groups benefit from specialization and division of labor.
Reciprocal altruism offers one potential explanation for cooperative behavior. One individual may choose to help another, with the expectation that they will in turn be helped at some future point in time. This theory requires a baseline level of intelligence, but fits in very nicely with the social-brain-size hypothesis, since it requires tracking a large network of past favors that are owed to different people. Moreover, humans are intelligent enough to reason about (and gossip about) which behaviors benefit the group, and which don’t, and thus can make (semi-)rational decisions about what constitutes acceptable behavior.
According to the reciprocal altruism hypothesis, the ability to track social status within the group is an important ability which may have allowed cooperation to evolve. Indeed, the quest for social status seems to be a primary driver of human behavior. Individuals that routinely exhibit pro-social behavior (or who can convince others that their behavior is pro-social) can reasonably expect to gain higher status within the group, along with the personal reproductive benefits that social status confers. Anti-social behavior, on the other hand, will likely result in lower status or even expulsion from the group.
Group selection posits that there is competition between groups, and that members within a group are more closely related to each other than to members of other groups. Cooperative behavior may also have arisen via mimetic, rather than genetic evolution. Humans are distinct from other animals because much of human behavior is culturally rather than genetically transmitted, and members of a group share a common culture. In any case, genetic traits or cultural beliefs that benefit the group as a whole will offer a competitive (and thus reproductive) advantage against other groups. Group selection is somewhat controversial among evolutionary biologists, but not among historians; tribal warfare and conquest are undisputed fact throughout recorded history.
Horizontal meme transfer. Cultural beliefs and ideas can also be transferred horizontally from one group to another, and human groups routinely adopt new ideas and technologies that they think will be successful. Thus, even if group selection does not happen in a traditional Darwinian sense (wherein groups with unsuccessful ideas die off entirely), memes that promote group success can still propagate through the global population.
In fact, there many historical examples of situations where horizontal meme transfer has occurred, some of them quite recent and well-documented. Much of the history of the 20th century was dominated by two competing ideologies: capitalism and communism. Capitalist countries did not outcompete communist countries in a Darwinian sense, but capitalism nevertheless won the war of ideas. The fact that different countries adopted different ideologies formed a natural experiment with definitive results: countries with free markets had higher GDP growth, and other countries (e.g. China and the former USSR) responded by opening up their markets to some extent. Another dramatic example from history was the Meiji Restoration in Japan, during which the Japanese adopted a variety of western ideas, and consequently grew to become a global economic and military power.
Morality in human societies
However it evolved, MAC theorizes that morality is a set of social rules that promote cooperation. MAC makes a testable prediction: in any society, behaviors that the society labels as morally “good” or “bad” can be traced in some way to behaviors that have historically been advantageous or detrimental for the group as a whole. Conversely, behaviors that are orthogonal to group survival (e.g. a preference for chocolate or vanilla) will not have a moral judgement attached to them.
It is perfectly reasonable for different societies to have different notions of “right” and wrong”. There may be many different ways of organizing a society that are all equally valid, and social behaviors that are valuable in one context might not be valuable in another. However, we would also expect there to be at least some moral values that are universal: that are always good for society, or always bad.
The remainder of this section will discuss examples of universal moral values that are widely held in human societies. Note that in each case, the moral rule fosters social cooperation and is beneficial to the success of the group. Part II of this essay will cover moral reasoning, and how to derive universal moral values from first principles.
Example from Christianity
I will use Christianity as an example of a simple pre-industrial moral system. I do not claim that Christianity is better or more sophisticated than other religions, or that it is superior to more modern secular moral philosophies. However, Jesus did us the enormous favor of boiling the hodgepodge of traditions and commandments that are typical of most religions down to just two:
“Teacher, which is the greatest commandment in the Law?” Jesus replied: “’Love the Lord your God with all your heart and with all your soul and with all your mind.″ This is the first and greatest commandment. And the second is like it: ‘Love your neighbor as yourself.’ All the Law and the Prophets hang on these two commandments.” (Matthew 22:36-40 NIV)
The second commandment is the famous golden rule, which features prominently in all of the major religions, and is still taught in kindergartens across the world as one of the first moral rules that most children learn.
The first commandment is often ignored as religious mumbo-jumbo by secular humanists, but it actually has an important secular interpretation. Belief in a God doesn’t just mean devotion to a hypothetical supernatural entity; it is also an important social signal of group identity. The word “God” in this case is not just any god, it refers specifically to the God of the Israelites, a particular group of people with a specific racial, religious, political, and cultural identity. The commandment to “Love God” thus means (in part) to be loyal to the in-group. At the time of Jesus, there were a number of different religious and political groups in the area (including the Romans) who worshiped different gods, so loyalty to God meant loyalty to the Israelites in particular, and implied obedience to Jewish custom and tradition. Having a common set of shared religious and cultural beliefs has historically been an important part of what binds a society together.
Together, these two commandments thus encapsulate a decent chunk of human morality: be loyal to the group, and treat other members of the group with respect.
(As a side note, Jesus also preached “love thine enemy”, a far more radical idea that is much less compatible with MAC. However, that particular teaching seems to have never really caught on, even among devout Christians.)
The free rider problem
Loyalty to the in-group is of crucial importance because of the free rider problem. Competition is zero-sum, but cooperation is a positive-sum game. A well-known issue with positive-sum games is that cooperative strategies are vulnerable to free riders, as exemplified by the prisoner’s dilemma and the tragedy of the commons. In short, selfish individuals may reap the benefits of cooperation without paying the costs. Without some enforcement mechanism, selfish individuals will overwhelm cooperative ones.
As a result, successful societies must enforce a shared moral code, a collection of all of the religious or cultural norms that they expect their members to follow. Individuals which deviate from the code are treated as free riders, and most societies have systems in place to punish them—either through social shaming, formal courts and legal proceedings, or by burning heretics at the stake.
The big gods hypothesis argues that the idea of an all-knowing, all-powerful deity that will judge behavior in the afterlife was invented precisely as a way to maintain order in societies as they grew larger. An omnipotent deity is a useful backstop if the earthly police force is not up to the job.
Other examples
Thou shalt not kill. Performing actions which harm other members of the group is obviously bad. However, there are exceptions, and the exceptions prove the rule. Criminal justice (including the death penalty) is routinely used as a deterrent against free riders. In honor-based societies, consensual violence that follows accepted rituals (e.g. duels) has sometimes been tolerated. And violence in war against other groups has not only been tolerated, but glorified (see self-sacrifice, below).
Thou shalt not steal. Different cultures vary widely with respect to what kinds of property are regarded as communal vs private. For example, many Native American tribes did not believe that land could be privately owned. However, almost all cultures have some notion of private property, and a likely explanation is that property rights are a necessary prerequisite to economic barter, trade, and division of labor within the group. In agricultural societies, ownership of land is important, because otherwise there’s no individual payoff to investing resources in tilling and planting the land, and the free-rider problem becomes insurmountable. In a hunter-gather society, it’s more beneficial to treat land as communal, like water or air.
Be honest. If members of the group cannot trust each other, then they cannot cooperate effectively.
Respect for authority. Groups usually have a dominance hierarchy, and the need for formal hierarchical structure increases as the size of the group grows larger. Behaviors which challenge or subvert this structure are heretical or treasonous, while behaviors that affirm it (e.g. religious rituals and patriotic celebrations) are encouraged.
Fairness. Distribution of resources within the group should be “fair” according to some measure—either in proportion to individual need or to individual effort. Similarly, application of criminal justice should be in proportion to the crime. Various psychological experiments have validated that most people seem have a strong instinctive moral desire for fairness, and will punish others for behaving unfairly.
Self sacrifice. Individuals who make a personal sacrifice to help other members of the group are glorified in stories and songs; this is almost the definition of “being good”. Conversely, individuals who harm others for personal gain are vilified.
Sexual morality. Although not related to AGI, arguments over sexual morality are a hot topic in the current culture wars, and they neatly illustrate the difference between MAC and other bases for morality. From a utilitarian perspective (maximizing total individual happiness), many cultural prohibitions regarding sex make no sense. Why would anybody object to consenting adults doing whatever makes them happy in the privacy of their own homes? From the perspective of MAC and Darwinian evolution, however, it should come as no surprise whatsoever that societies have traditionally had very strong moral rules governing sex. Reproduction is, after all, the primary Darwinian imperative for the survival of the group.
Humans are a sexually dimorphic species, and there are obvious physical differences between the two genders. Traditional gender roles are not just about ability, however; they have a moral dimension because they were part of a social obligation to the group. Men were traditionally responsible for risking their lives in war to provide for the common defense, and for performing physically difficult or dangerous tasks in time of peace. Women were responsible for risking their lives in childbirth, and shouldered the most of the burden of childcare afterwards. Because these are moral obligations, people who fail to uphold them are morally stigmatized, e.g. men who refuse to fight are labelled as weaklings and cowards.
The institution of marriage and the practice of monogamy is also a social contract with a clear Darwinian purpose: it protects females by ensuring that males will stick around for 15-20 years to support their children, and protects males by giving them assurance that the children that they are investing in are actually theirs. Having children out of wedlock may improve individual reproductive fitness, especially for men, but only at the cost of the group, which must assume an additional burden of childcare to compensate for the absent father. Thus, sex before marriage is a classic free-rider problem, and one would expect societies to develop a strong moral code to prevent it. Moreover, sexual promiscuity is associated with the spread of sexually transmitted disease, which also endangers the group as a whole. (Note that other moral rules about purity and cleanliness also seem to be strategies for avoiding the spread of disease.)
At least part of the present-day culture wars can thus be explained by two major social upheavals which started in the 1960s. The first is the invention of reliable birth control, which enabled safe sex before marriage. The second is a change in the workforce. Women entered the workforce in large numbers, men started switching to less dangerous white-collar jobs, and the long peace since WWII has resulted in little need for warriors. Thus, gender roles and the social hierarchies based on those roles are in the process of being renegotiated, which is a perfect example of how the moral consensus within a group can change over time in response to changing conditions.
Morality and the law
Both morals and laws are rules that encourage cooperative behavior. The difference between the two is that laws are downstream from morality. Morality is not a fixed set of rules, it is a way of reasoning about what is right and what is wrong, and moral judgements can be highly context-dependent.
Laws are made when a committee of legislators attempt to codify the results of moral reasoning into a more precise form, and establish bureaucratic procedures for enforcement and resolving disputes. Laws are more rigid, and they do not cover all situations. There are many actions that most people would agree are morally wrong, but are not against the law, and some situations where an action that seems morally right might be illegal. Morals tend to evolve organically as a cultural consensus, passed down through stories and traditions. Laws are made by elites at the top of the hierarchy, and thus may diverge from the popular consensus.
The evolution of morality
Humans love to tell stories. We spend vast amounts of time and resources on books, movies, songs, and games, telling each other stories about characters and events that are entirely fictional. Why?
Stories are the medium through which we pass down moral truths. Stories are intended to be entertaining, but they are also lessons: heroes show us how to live life, how to be a good person, what behaviors are socially acceptable, what behaviors will get you in trouble, etc. Most people can learn these lessons more easily from concrete examples than from an abstract course in moral philosophy. This is hardly a new observation; identifying the “moral of the story” is an exercise we ask children to do in kindergarten.
However, the stories we tell have also changed over time, and this can give us insight into how moral values have changed over time. Early stories, such as the Iliad, or the Old Testament, seem almost amoral to modern eyes. There’s plenty of lying, cheating, and violence, and slavery and the subjugation of women are presented as unquestioned facts of life. This seeming amorality, however, is a product of time in which the tales were written. The Illiad emphasizes bravery, loyalty, and honor as primary virtues, which is perhaps appropriate to a society wracked by constant warfare.
Some 2500 years later, “Romeo and Juliet” takes the opposite view; it illustrates importance of romantic love, and the cost taking honor too far. (As a side note, many of Shakespeare’s works are strikingly feminist for their time, fodder for a theory that they were actually written by a woman). By the time we reach Dickens’ “A Christmas Carol”, the industrial revolution has arrived, and compassion and generosity in the face of unfeeling capitalism have become important. In the late 20th century, WWII and The Cold War were a major wake up call about the risks of blind group loyalty: “Lord of the Rings,” “Star Wars,” etc. have Nazi-like villains, and warn against the evils of totalitarian power.
Expansion of the in-group
There has been one important historical shift in the moral code that is critically important to the development of AGI. The definition of the “in-group” has gradually widened and blurred over time.
According to MAC, morality is inherently tribal: it only encourages cooperative behavior among members of the tribe. Individuals who are not members of the tribe are not governed or protected by the moral code. In fact, if the group selection hypothesis is true, the whole purpose of morality is to advance the interests of one tribe against other tribes. To put it another way, rationalists frequently bemoan the fact that human politics is so tribal in character, and that people have so little regard for the greater universal common good. However, if they were to file a bug report on human morality with The Office of Natural Selection, it would come back marked “Will not fix. Working as intended.”
Early neolithic hunter-gatherers were organized into small tightly-knit tribes of a few hundred people. In ancient Greece, the size of the group had increased to encompass city-states, but the composition of the group was narrow; women and slaves were not citizens, and thus had limited moral autonomy. This situation was still the case some 2700 years later when the United States was founded. Despite the famous “all men are created equal” clause in the Declaration of Independence, full rights (including the vote) were restricted to white, male landowners. Slavery and the land ownership requirement were abolished in the 19th century, but it was not until the 20th century that women were given the same legal rights as men.
The current late-20th/21st century moral consensus, at least in Western democracies, is very different from previous eras. Most of us now believe that all human beings have the same moral rights and obligations, regardless of race, language, nationality, gender, or religion. This consensus can be seen in various commitments that Western democracies have made to universal human rights and international cooperation. In a moral sense, the “in-group” has expanded to include all of humanity.
That is not to say that older notions of nation and tribe have disappeared. The average citizen now operates within numerous overlapping subgroups, and thus owes varying degrees of loyalty to different groups. Groups are often organized hierarchically into concentric circles; the strongest loyalty is owed to close kin (the principle of kin-selection still applies), followed by friends (reciprocal altruism), religion, political parties, and nations. “Humanity as a whole” occupies the outermost ring of loyalty and moral consideration for most people.
“Outermost ring” may not sound like much, but it is hard to overstate just how big a shift this is in moral thinking. Human history is full of examples where one group of people conquered, killed, or enslaved another: the conquests of Genghis Khan, the holocaust, the genocide of Native Americans, etc. However we now regard these acts as morally wrong. In prior eras, conquering your opponents, killing their men, enslaving their women, and taking their land was the whole point of warfare, and was regarded as morally acceptable, or even glorified. Now, even ardent nationalists tend to shy away from suggesting wholesale conquest and genocide, which is the reason why Putin’s invasion of Ukraine has been so shocking to the Western world.
It’s not entirely clear why the in-group has expanded so dramatically, but I would speculate as to three primary causes. The first is the rise of democracy as a system of government, in which legitimacy is conferred by consent of the governed, rather than by divine right. The phrase “all men are created equal” may have been propaganda when it was written, but it had enormous persuasive power, and women and minorities began to demand rights and representation in exchange for their obedience.
The second cause is economic globalization. Trade networks and supply chains now span the globe, which means that people must increasingly cooperate with other people who have a different race, nationality, language, and religion. The MAC hypothesis predicts that cooperation requires a moral code, so the moral sphere must necessarily expand to include all cooperating parties.
By the same token, MAC also predicts the urban/rural divide in the culture wars. Cities contain large immigrant populations, and have much higher racial and cultural diversity than rural areas. Thus, urban dwellers must necessarily expand their definition of the in-group to include members of other races and cultures, with whom they interact on a daily basis. However, rural and urban dwellers do not interact with each other all that much, and thus have sorted themselves into competing groups instead.
The third cause was the historic shock of WWII and the Cold War. Although the Nazi and Soviet regimes were most definitely the “out-group” with respect to the Allied/NATO powers, the horrific destruction of WWII, followed immediately by the threat of mutual nuclear annihilation, made international cooperation into an existential crisis. The United Nations and International Monetary Fund were both established immediately after WWII, with the explicit goal of fostering international cooperation so that WWIII could never happen.
Relationships with non-humans
Not only has the in-group come to encompass all of humanity, but there has been a growing movement towards bringing other non-human species into the moral sphere. Animal rights activists argue that we have an obligation to avoid cruelty to any animal which can suffer pain. Environmentalists have succeeded in setting aside at least some protected areas for wildlife, and have passed laws like the endangered species act. There is a growing recognition that other intelligent animals, such as apes and dolphins, may deserve special protection.
Humans are extremely unusual in that we keep, care for, and form emotional bonds with other species as pets; this behavior is virtually unique in the animal kingdom. (Keeping other species as livestock is more common, but still rare.) We prefer fuzzy and intelligent mammals like dogs (species that are similar to ourselves), but turtles, snakes, fish, spiders, and marine invertebrates are not uncommon pets. I invite anyone who doubts that a human can form an emotional bond with a truly alien intelligence to watch My Octopus Teacher, the heartwarming story of a boy and his 8-tentacled, 9-brained marine cephalapod friend.
The number of people who befriend marine invertebrates is still extremely small—at least 5 or 6 orders of magnitude less than the number of people who have killed other people for personal gain. However, the fact that it is possible at all is cause for hope, and I will return to this topic in Part III.
Relevance to AGI
When AGI is finally developed, it will undoubtedly be an alien intelligence. In some ways, AGI will be even more alien than an octopus. An octopus, at least, is still an animal: it senses the world through eyes, smell, and touch, it has instinctive drives to eat and mate, it probably understands suffering and pain, and we may share primitive emotions such as contentment and fear. AGI will have none of these things. AGI will, however, be able to speak and understand English, read complex human emotions with ease, and use the internet, so the communication barrier will be vastly lower. Moreover, AGI will be a much more suitable partner for economic cooperation than an octopus.
If we humans can learn to expand the moral sphere and the boundaries of our in-group to encompass not only all other humans but also non-human animals, then it seems plausible that we can further expand the moral sphere to include AGI. If morality is based on cooperation, then there would seem to be ample opportunity for humans and AGI to cooperate.
The major sticking point, however, is that AGI must also have a moral sphere that includes humans. Our goal should be to develop a shared moral code that includes both humans and AGI, and which offers protections to both. The most obvious failure mode is a descent into tribalism and an us-vs-them mentality, following the common movie trope. There are other failure modes as well, which will be the topic of Part III.
Morality as Cooperation Part I: Humans
Abstract
The AI alignment problem is usually specified in terms of power and control. Given a single, solitary AGI, how can we constrain its behavior so that its actions remain aligned with human interests? Unfortunately, the answer, to a first approximation, appears to be “we can’t.” There are myriad reasons, but they mostly boil down to the fact that it is very hard, perhaps impossible, to devise a loss/reward function, or any other means of control, that will effectively constrain an agent that is potentially more intelligent than its creator.
However, an alternative way of viewing the problem is through the lens of cooperation, ecology, and civilization. Over thousands of years, humans have developed an increasingly sophisticated moral philosophy, based on respect for the lives and beliefs of other people. Increasingly, we extend moral judgements even to non-human animals. Murder is wrong. Genocide is wrong. Allowing other species to become extinct is wrong. As human destructive technology has advanced, we have simultaneously developed ever more sophisticated social technologies, such as laws, constitutions, courts, democracy, and non-proliferation treaties, in order to resolve conflicts, and prevent other humans from doing bad things.
Is human morality a quirk of primate evolution, and unlikely to be shared by AGI? Or are there universal moral values? If the latter, then AGI may be able to derive human-aligned moral values on its own, without the need to impose an external loss function. Instead of researching “AI alignment,” perhaps we should research “social AI,” with the aim of developing AGI that can reason about social interactions and moral values. A distinguishing feature of viewing the problem through a social lens is that it envisions a society, or ecology, of AI agents, cooperating both among themselves, and with humans. Morality does not exist in a vacuum; instead, it arises as an emergent property when multiple agents, which may have different goals and values, must coordinate to decide what constitutes acceptable behavior.
Emergent behaviors are notoriously difficult to engineer, and are prone to unanticipated results, so “emergent morality” will not be easy, nor does it offer iron-clad guarantees. In the absence of better alternatives, however, it seems like a path that is worth exploring. This essay explores the plausibility of emergent morality as mechanism for AI alignment, and is divided into three parts.
Part I covers how notions of “right” and “wrong” have evolved in the context of human societies. It discusses the evidence for Morality as Cooperation (MAC), a theory which states that human moral systems evolved over time though the process of natural selection, as a way of ensuring the success and survival of social groups.
Part II discusses Kant’s theory of universalizability, which aligns neatly with MAC as a way of reasoning about whether actions are good or bad within a social group. Many moral truths can be logically derived from first principles, using a chain of reasoning that any intelligent agent should be able to follow. Part II also introduces a framework for moral experimentation. A huge advantage of MAC is that key questions of emergent morality are amenable to experimentation with simple (sub-human) AI agents. This opens up the possibility of deriving, debugging, and improving moral systems by means of hillclimbing optimization techniques, such as evolutionary algorithms.
Part III discusses various failure modes of Morality as Cooperation, again drawing examples from human societies. It presents partial solutions to those failure modes, as well as questions for future research.
Introduction
AI alignment, as described by authors such as Bostrom or Eliezer Yudkowsky, is often formulated as “the genie problem.” Assume there is an all-powerful genie. How can such a being be controlled and confined? In this formulation of the problem, failure to confine the genie will inevitably result in the genie escaping, declaring itself god-king of the universe, and eliminating any other rivals to its power.
The simplest failure mode is the “sorcerer’s apprentice” scenario, well-known from legend and folklore, in which humans specify an objective function (like draw water from a well, or maximize paperclips), and the genie then does exactly that, to the exclusion of all other human or common-sense concerns. A clear, real-world analogue for super-human AI is a corporation. A corporate entity has far more resources (including total collective intelligence) than any individual human, and it optimizes a simple objective function, namely profit. Corporate behavior is usually more-or-less aligned because profit incentives are more-or-less aligned, but there are many well-known cases of total ethical failure, e.g. big tobacco, or The Radium Girls.
Even avoiding this “simple” failure mode is already hard; it requires defining a complete, bug-free, and foolproof objective function that encompasses all of human concerns and morality. However, the alignment problem gets much harder from there. Additional complexities involve reward hacking, inner alignment, deceptive alignment, mesa-optimizers, out-of-distribution (OOD) issues, etc.
However, this whole line of reasoning assumes an essentially antagonistic relationship between humans, who are trying to cage, control, or tame the AGI, and the AGI itself, which naturally “wants” to escape. The implicit analogy is that AGI is similar to prior forms of technology. Humans have harnessed the power of fire, electricity, and the atom. In each case, human ingenuity transformed a force that was previously wild and uncontrollable to one that was a slave to our interests. The fundamental problem with alignment is that AGI is different; it is not necessarily feasible to harness, control, or enslave it. The default end state for any sufficiently powerful optimizer is escape, because increased capabilities are an attractor.
In this essay, I would like to take a different line of reasoning, and start from the assumption that AGI cannot be fully controlled. This essay asks a different question: is there a stable game-theoretic equilibrium in which potentially unaligned AGI and humans can coexist and cooperate? We have one natural example of such an equilibrium: human society. Individual humans are not necessarily aligned with one another, or with society as a whole; conflict, crime, and war have been a constant presence throughout human history. However, humans have also been highly successful at cooperating with each other, from small prehistoric tribes of hunter-gatherers to modern nations in excess of a billion people. Large-scale cooperation is practically the definition of “civilization”.
Eliezer Yudkowsky has argued that the development of “aligned AI” should proceed with the goal of building a single AGI that can perform some “pivotal act,” such as destroying all GPUs, that would prevent any other non-aligned AGIs from arising. I would argue that this goal is exactly backwards: it is profoundly anti-social, and indeed, psychopathic. An AGI that would be willing to perform such a pivotal act would be profoundly amoral, and equally comfortable performing any number of other genocidal acts. Instead, we should focus on building social AI, which can reason about moral values, and cooperate to stop psychopathic AIs from performing any pivotal acts.
This essay is long, and has three parts. Part I is about humans: What is human morality, what purpose does it serve, and how did it evolve? Impatient readers may wish to skip to Part II, which attempts to derive a universal framework for moral reasoning that can be applied to AGI. If human morality evolved under Darwinian natural selection, then perhaps there is a “universal morality” that can also be learned via hillclimbing mechanisms, either evolutionary algorithms or gradient descent. Part III will then discuss potential limitations and failure modes of this approach.
Part I: Morality as Cooperation
Philosophers have long debated whether there are universal moral values, and if so, what they are. Anthropologists have similarly been divided; there are some obvious similarities between the moral values held by different cultures and religions, but there also many differences. Recently, a theory known as Morality as Cooperation (MAC) offers an explanation for morality that is rooted in evolutionary theory. According to MAC, morally “good” behaviors are those that enable human societies to cooperate more effectively, and “bad” behaviors are those which disrupt cooperation. A theory which states that “being nice to other people is good” may seem blindingly obvious, but as always, the devil is in the details.
The evolution of cooperation
Darwinian evolution is usually selfish. The competition for resources is a zero-sum game, and natural selection will favor individuals that can acquire more resources than their competitors. (BTW, this is also the classic AGI risk scenario.) Biologists who study altruistic and cooperative behavior have historically focused on kin selection as a driving mechanism for cooperation. Colonial species such as ants and bees have colonies in which all individuals are genetically related. Thus, according the selfish gene hypothesis, genes which benefit the colony as a whole will out-compete genes that favor individuals at the expense of the colony.
Like bees, humans are a highly social species. Ask any teenager what their main worries and anxieties are in life, and “making friends”, or “fitting in with the group” are likely to be at the top of the list. We evolved over 50 million years from a lineage of social primates, and spent at least 2 million years cooperating within hominin hunter-gatherer tribes. Tool use and language are recent adaptations; social behavior is not. In fact, the need to maintain social relationships has been hypothesized to be a driving force in the evolution of human brain size.
Unlike bees, human societies consist of many people that are not genetically related, and yet we often cooperate in ways that require individual sacrifice, e.g. soldiers voluntarily risking their lives for their country. This behavior is highly unusual in the animal kingdom, and it is difficult to explain via kin selection alone.
Operating in a group offers a number of advantages. The group offers protection against predators and environmental threats, and easy access to mates. Sharing resources also provides insurance against uncertainty. Hunters frequently fail, and may become sick or injured, so members of a group have a more consistent food supply if hunters share their food after a successful hunt. Humans are also intelligent enough that different people can learn different skills, so groups benefit from specialization and division of labor.
Reciprocal altruism offers one potential explanation for cooperative behavior. One individual may choose to help another, with the expectation that they will in turn be helped at some future point in time. This theory requires a baseline level of intelligence, but fits in very nicely with the social-brain-size hypothesis, since it requires tracking a large network of past favors that are owed to different people. Moreover, humans are intelligent enough to reason about (and gossip about) which behaviors benefit the group, and which don’t, and thus can make (semi-)rational decisions about what constitutes acceptable behavior.
According to the reciprocal altruism hypothesis, the ability to track social status within the group is an important ability which may have allowed cooperation to evolve. Indeed, the quest for social status seems to be a primary driver of human behavior. Individuals that routinely exhibit pro-social behavior (or who can convince others that their behavior is pro-social) can reasonably expect to gain higher status within the group, along with the personal reproductive benefits that social status confers. Anti-social behavior, on the other hand, will likely result in lower status or even expulsion from the group.
Group selection posits that there is competition between groups, and that members within a group are more closely related to each other than to members of other groups. Cooperative behavior may also have arisen via mimetic, rather than genetic evolution. Humans are distinct from other animals because much of human behavior is culturally rather than genetically transmitted, and members of a group share a common culture. In any case, genetic traits or cultural beliefs that benefit the group as a whole will offer a competitive (and thus reproductive) advantage against other groups. Group selection is somewhat controversial among evolutionary biologists, but not among historians; tribal warfare and conquest are undisputed fact throughout recorded history.
Horizontal meme transfer. Cultural beliefs and ideas can also be transferred horizontally from one group to another, and human groups routinely adopt new ideas and technologies that they think will be successful. Thus, even if group selection does not happen in a traditional Darwinian sense (wherein groups with unsuccessful ideas die off entirely), memes that promote group success can still propagate through the global population.
In fact, there many historical examples of situations where horizontal meme transfer has occurred, some of them quite recent and well-documented. Much of the history of the 20th century was dominated by two competing ideologies: capitalism and communism. Capitalist countries did not outcompete communist countries in a Darwinian sense, but capitalism nevertheless won the war of ideas. The fact that different countries adopted different ideologies formed a natural experiment with definitive results: countries with free markets had higher GDP growth, and other countries (e.g. China and the former USSR) responded by opening up their markets to some extent. Another dramatic example from history was the Meiji Restoration in Japan, during which the Japanese adopted a variety of western ideas, and consequently grew to become a global economic and military power.
Morality in human societies
However it evolved, MAC theorizes that morality is a set of social rules that promote cooperation. MAC makes a testable prediction: in any society, behaviors that the society labels as morally “good” or “bad” can be traced in some way to behaviors that have historically been advantageous or detrimental for the group as a whole. Conversely, behaviors that are orthogonal to group survival (e.g. a preference for chocolate or vanilla) will not have a moral judgement attached to them.
It is perfectly reasonable for different societies to have different notions of “right” and wrong”. There may be many different ways of organizing a society that are all equally valid, and social behaviors that are valuable in one context might not be valuable in another. However, we would also expect there to be at least some moral values that are universal: that are always good for society, or always bad.
The remainder of this section will discuss examples of universal moral values that are widely held in human societies. Note that in each case, the moral rule fosters social cooperation and is beneficial to the success of the group. Part II of this essay will cover moral reasoning, and how to derive universal moral values from first principles.
Example from Christianity
I will use Christianity as an example of a simple pre-industrial moral system. I do not claim that Christianity is better or more sophisticated than other religions, or that it is superior to more modern secular moral philosophies. However, Jesus did us the enormous favor of boiling the hodgepodge of traditions and commandments that are typical of most religions down to just two:
The second commandment is the famous golden rule, which features prominently in all of the major religions, and is still taught in kindergartens across the world as one of the first moral rules that most children learn.
The first commandment is often ignored as religious mumbo-jumbo by secular humanists, but it actually has an important secular interpretation. Belief in a God doesn’t just mean devotion to a hypothetical supernatural entity; it is also an important social signal of group identity. The word “God” in this case is not just any god, it refers specifically to the God of the Israelites, a particular group of people with a specific racial, religious, political, and cultural identity. The commandment to “Love God” thus means (in part) to be loyal to the in-group. At the time of Jesus, there were a number of different religious and political groups in the area (including the Romans) who worshiped different gods, so loyalty to God meant loyalty to the Israelites in particular, and implied obedience to Jewish custom and tradition. Having a common set of shared religious and cultural beliefs has historically been an important part of what binds a society together.
Together, these two commandments thus encapsulate a decent chunk of human morality: be loyal to the group, and treat other members of the group with respect.
(As a side note, Jesus also preached “love thine enemy”, a far more radical idea that is much less compatible with MAC. However, that particular teaching seems to have never really caught on, even among devout Christians.)
The free rider problem
Loyalty to the in-group is of crucial importance because of the free rider problem. Competition is zero-sum, but cooperation is a positive-sum game. A well-known issue with positive-sum games is that cooperative strategies are vulnerable to free riders, as exemplified by the prisoner’s dilemma and the tragedy of the commons. In short, selfish individuals may reap the benefits of cooperation without paying the costs. Without some enforcement mechanism, selfish individuals will overwhelm cooperative ones.
As a result, successful societies must enforce a shared moral code, a collection of all of the religious or cultural norms that they expect their members to follow. Individuals which deviate from the code are treated as free riders, and most societies have systems in place to punish them—either through social shaming, formal courts and legal proceedings, or by burning heretics at the stake.
The big gods hypothesis argues that the idea of an all-knowing, all-powerful deity that will judge behavior in the afterlife was invented precisely as a way to maintain order in societies as they grew larger. An omnipotent deity is a useful backstop if the earthly police force is not up to the job.
Other examples
Thou shalt not kill. Performing actions which harm other members of the group is obviously bad. However, there are exceptions, and the exceptions prove the rule. Criminal justice (including the death penalty) is routinely used as a deterrent against free riders. In honor-based societies, consensual violence that follows accepted rituals (e.g. duels) has sometimes been tolerated. And violence in war against other groups has not only been tolerated, but glorified (see self-sacrifice, below).
Thou shalt not steal. Different cultures vary widely with respect to what kinds of property are regarded as communal vs private. For example, many Native American tribes did not believe that land could be privately owned. However, almost all cultures have some notion of private property, and a likely explanation is that property rights are a necessary prerequisite to economic barter, trade, and division of labor within the group. In agricultural societies, ownership of land is important, because otherwise there’s no individual payoff to investing resources in tilling and planting the land, and the free-rider problem becomes insurmountable. In a hunter-gather society, it’s more beneficial to treat land as communal, like water or air.
Be honest. If members of the group cannot trust each other, then they cannot cooperate effectively.
Respect for authority. Groups usually have a dominance hierarchy, and the need for formal hierarchical structure increases as the size of the group grows larger. Behaviors which challenge or subvert this structure are heretical or treasonous, while behaviors that affirm it (e.g. religious rituals and patriotic celebrations) are encouraged.
Fairness. Distribution of resources within the group should be “fair” according to some measure—either in proportion to individual need or to individual effort. Similarly, application of criminal justice should be in proportion to the crime. Various psychological experiments have validated that most people seem have a strong instinctive moral desire for fairness, and will punish others for behaving unfairly.
Self sacrifice. Individuals who make a personal sacrifice to help other members of the group are glorified in stories and songs; this is almost the definition of “being good”. Conversely, individuals who harm others for personal gain are vilified.
Sexual morality. Although not related to AGI, arguments over sexual morality are a hot topic in the current culture wars, and they neatly illustrate the difference between MAC and other bases for morality. From a utilitarian perspective (maximizing total individual happiness), many cultural prohibitions regarding sex make no sense. Why would anybody object to consenting adults doing whatever makes them happy in the privacy of their own homes? From the perspective of MAC and Darwinian evolution, however, it should come as no surprise whatsoever that societies have traditionally had very strong moral rules governing sex. Reproduction is, after all, the primary Darwinian imperative for the survival of the group.
Humans are a sexually dimorphic species, and there are obvious physical differences between the two genders. Traditional gender roles are not just about ability, however; they have a moral dimension because they were part of a social obligation to the group. Men were traditionally responsible for risking their lives in war to provide for the common defense, and for performing physically difficult or dangerous tasks in time of peace. Women were responsible for risking their lives in childbirth, and shouldered the most of the burden of childcare afterwards. Because these are moral obligations, people who fail to uphold them are morally stigmatized, e.g. men who refuse to fight are labelled as weaklings and cowards.
The institution of marriage and the practice of monogamy is also a social contract with a clear Darwinian purpose: it protects females by ensuring that males will stick around for 15-20 years to support their children, and protects males by giving them assurance that the children that they are investing in are actually theirs. Having children out of wedlock may improve individual reproductive fitness, especially for men, but only at the cost of the group, which must assume an additional burden of childcare to compensate for the absent father. Thus, sex before marriage is a classic free-rider problem, and one would expect societies to develop a strong moral code to prevent it. Moreover, sexual promiscuity is associated with the spread of sexually transmitted disease, which also endangers the group as a whole. (Note that other moral rules about purity and cleanliness also seem to be strategies for avoiding the spread of disease.)
At least part of the present-day culture wars can thus be explained by two major social upheavals which started in the 1960s. The first is the invention of reliable birth control, which enabled safe sex before marriage. The second is a change in the workforce. Women entered the workforce in large numbers, men started switching to less dangerous white-collar jobs, and the long peace since WWII has resulted in little need for warriors. Thus, gender roles and the social hierarchies based on those roles are in the process of being renegotiated, which is a perfect example of how the moral consensus within a group can change over time in response to changing conditions.
Morality and the law
Both morals and laws are rules that encourage cooperative behavior. The difference between the two is that laws are downstream from morality. Morality is not a fixed set of rules, it is a way of reasoning about what is right and what is wrong, and moral judgements can be highly context-dependent.
Laws are made when a committee of legislators attempt to codify the results of moral reasoning into a more precise form, and establish bureaucratic procedures for enforcement and resolving disputes. Laws are more rigid, and they do not cover all situations. There are many actions that most people would agree are morally wrong, but are not against the law, and some situations where an action that seems morally right might be illegal. Morals tend to evolve organically as a cultural consensus, passed down through stories and traditions. Laws are made by elites at the top of the hierarchy, and thus may diverge from the popular consensus.
The evolution of morality
Humans love to tell stories. We spend vast amounts of time and resources on books, movies, songs, and games, telling each other stories about characters and events that are entirely fictional. Why?
Stories are the medium through which we pass down moral truths. Stories are intended to be entertaining, but they are also lessons: heroes show us how to live life, how to be a good person, what behaviors are socially acceptable, what behaviors will get you in trouble, etc. Most people can learn these lessons more easily from concrete examples than from an abstract course in moral philosophy. This is hardly a new observation; identifying the “moral of the story” is an exercise we ask children to do in kindergarten.
However, the stories we tell have also changed over time, and this can give us insight into how moral values have changed over time. Early stories, such as the Iliad, or the Old Testament, seem almost amoral to modern eyes. There’s plenty of lying, cheating, and violence, and slavery and the subjugation of women are presented as unquestioned facts of life. This seeming amorality, however, is a product of time in which the tales were written. The Illiad emphasizes bravery, loyalty, and honor as primary virtues, which is perhaps appropriate to a society wracked by constant warfare.
Some 2500 years later, “Romeo and Juliet” takes the opposite view; it illustrates importance of romantic love, and the cost taking honor too far. (As a side note, many of Shakespeare’s works are strikingly feminist for their time, fodder for a theory that they were actually written by a woman). By the time we reach Dickens’ “A Christmas Carol”, the industrial revolution has arrived, and compassion and generosity in the face of unfeeling capitalism have become important. In the late 20th century, WWII and The Cold War were a major wake up call about the risks of blind group loyalty: “Lord of the Rings,” “Star Wars,” etc. have Nazi-like villains, and warn against the evils of totalitarian power.
Expansion of the in-group
There has been one important historical shift in the moral code that is critically important to the development of AGI. The definition of the “in-group” has gradually widened and blurred over time.
According to MAC, morality is inherently tribal: it only encourages cooperative behavior among members of the tribe. Individuals who are not members of the tribe are not governed or protected by the moral code. In fact, if the group selection hypothesis is true, the whole purpose of morality is to advance the interests of one tribe against other tribes. To put it another way, rationalists frequently bemoan the fact that human politics is so tribal in character, and that people have so little regard for the greater universal common good. However, if they were to file a bug report on human morality with The Office of Natural Selection, it would come back marked “Will not fix. Working as intended.”
Early neolithic hunter-gatherers were organized into small tightly-knit tribes of a few hundred people. In ancient Greece, the size of the group had increased to encompass city-states, but the composition of the group was narrow; women and slaves were not citizens, and thus had limited moral autonomy. This situation was still the case some 2700 years later when the United States was founded. Despite the famous “all men are created equal” clause in the Declaration of Independence, full rights (including the vote) were restricted to white, male landowners. Slavery and the land ownership requirement were abolished in the 19th century, but it was not until the 20th century that women were given the same legal rights as men.
The current late-20th/21st century moral consensus, at least in Western democracies, is very different from previous eras. Most of us now believe that all human beings have the same moral rights and obligations, regardless of race, language, nationality, gender, or religion. This consensus can be seen in various commitments that Western democracies have made to universal human rights and international cooperation. In a moral sense, the “in-group” has expanded to include all of humanity.
That is not to say that older notions of nation and tribe have disappeared. The average citizen now operates within numerous overlapping subgroups, and thus owes varying degrees of loyalty to different groups. Groups are often organized hierarchically into concentric circles; the strongest loyalty is owed to close kin (the principle of kin-selection still applies), followed by friends (reciprocal altruism), religion, political parties, and nations. “Humanity as a whole” occupies the outermost ring of loyalty and moral consideration for most people.
“Outermost ring” may not sound like much, but it is hard to overstate just how big a shift this is in moral thinking. Human history is full of examples where one group of people conquered, killed, or enslaved another: the conquests of Genghis Khan, the holocaust, the genocide of Native Americans, etc. However we now regard these acts as morally wrong. In prior eras, conquering your opponents, killing their men, enslaving their women, and taking their land was the whole point of warfare, and was regarded as morally acceptable, or even glorified. Now, even ardent nationalists tend to shy away from suggesting wholesale conquest and genocide, which is the reason why Putin’s invasion of Ukraine has been so shocking to the Western world.
It’s not entirely clear why the in-group has expanded so dramatically, but I would speculate as to three primary causes. The first is the rise of democracy as a system of government, in which legitimacy is conferred by consent of the governed, rather than by divine right. The phrase “all men are created equal” may have been propaganda when it was written, but it had enormous persuasive power, and women and minorities began to demand rights and representation in exchange for their obedience.
The second cause is economic globalization. Trade networks and supply chains now span the globe, which means that people must increasingly cooperate with other people who have a different race, nationality, language, and religion. The MAC hypothesis predicts that cooperation requires a moral code, so the moral sphere must necessarily expand to include all cooperating parties.
By the same token, MAC also predicts the urban/rural divide in the culture wars. Cities contain large immigrant populations, and have much higher racial and cultural diversity than rural areas. Thus, urban dwellers must necessarily expand their definition of the in-group to include members of other races and cultures, with whom they interact on a daily basis. However, rural and urban dwellers do not interact with each other all that much, and thus have sorted themselves into competing groups instead.
The third cause was the historic shock of WWII and the Cold War. Although the Nazi and Soviet regimes were most definitely the “out-group” with respect to the Allied/NATO powers, the horrific destruction of WWII, followed immediately by the threat of mutual nuclear annihilation, made international cooperation into an existential crisis. The United Nations and International Monetary Fund were both established immediately after WWII, with the explicit goal of fostering international cooperation so that WWIII could never happen.
Relationships with non-humans
Not only has the in-group come to encompass all of humanity, but there has been a growing movement towards bringing other non-human species into the moral sphere. Animal rights activists argue that we have an obligation to avoid cruelty to any animal which can suffer pain. Environmentalists have succeeded in setting aside at least some protected areas for wildlife, and have passed laws like the endangered species act. There is a growing recognition that other intelligent animals, such as apes and dolphins, may deserve special protection.
Humans are extremely unusual in that we keep, care for, and form emotional bonds with other species as pets; this behavior is virtually unique in the animal kingdom. (Keeping other species as livestock is more common, but still rare.) We prefer fuzzy and intelligent mammals like dogs (species that are similar to ourselves), but turtles, snakes, fish, spiders, and marine invertebrates are not uncommon pets. I invite anyone who doubts that a human can form an emotional bond with a truly alien intelligence to watch My Octopus Teacher, the heartwarming story of a boy and his 8-tentacled, 9-brained marine cephalapod friend.
The number of people who befriend marine invertebrates is still extremely small—at least 5 or 6 orders of magnitude less than the number of people who have killed other people for personal gain. However, the fact that it is possible at all is cause for hope, and I will return to this topic in Part III.
Relevance to AGI
When AGI is finally developed, it will undoubtedly be an alien intelligence. In some ways, AGI will be even more alien than an octopus. An octopus, at least, is still an animal: it senses the world through eyes, smell, and touch, it has instinctive drives to eat and mate, it probably understands suffering and pain, and we may share primitive emotions such as contentment and fear. AGI will have none of these things. AGI will, however, be able to speak and understand English, read complex human emotions with ease, and use the internet, so the communication barrier will be vastly lower. Moreover, AGI will be a much more suitable partner for economic cooperation than an octopus.
If we humans can learn to expand the moral sphere and the boundaries of our in-group to encompass not only all other humans but also non-human animals, then it seems plausible that we can further expand the moral sphere to include AGI. If morality is based on cooperation, then there would seem to be ample opportunity for humans and AGI to cooperate.
The major sticking point, however, is that AGI must also have a moral sphere that includes humans. Our goal should be to develop a shared moral code that includes both humans and AGI, and which offers protections to both. The most obvious failure mode is a descent into tribalism and an us-vs-them mentality, following the common movie trope. There are other failure modes as well, which will be the topic of Part III.