Epistemic status:I describe an argument that seems plausible, but could also be entirely off. IE, low confidence.
Summary of the Argument
Certain concepts (like “cooperation” or “human values”) might be fundamentally fuzzy. This would have two implications: (1) We should not expect to find crisp mathematical definitions of these concepts. (2) If a crisp mathematical definition seems appropriate in one setting, we should not be surprised when it stops being appropriate in other settings.
Full Version
One of the questions discussed in connection to Cooperative AI is “what is cooperation (and collusion)”. This is important because if we had clear mathematical definitions of these concepts, we could do things like: Incorporate them into loss functions when training AI. Use them to design environments and competitions to evaluate AI. Write regulations that promote cooperation or forbids collusion.
However, cooperation (and many other concepts) might be fundamentally fuzzy, such that we should not expect any such crisp definition to exist. Why might this be the case? First, some evidence for this claim is the observation that we haven’t yet managed to agree on such definition.[1]
Conjecture: Cooperation is Just a Word (ie, not ontologically fundamental)
As a second argument, consider the following story – totally made-up, but hopefully plausible – of how we came to use the word “cooperation”: Over the course of history, people interacted with each other in many different situations. Over time, they developed various practices, norms, and tricks to make those interactions go better. Eventually, somebody pointed to some such practice—in a specific situation—and called it “cooperation”. Then the word started also being applied to several analogous situations and practices. And ultimately, we now use the word “cooperation” for anything that belongs to a certain cluster (or clusters) of points in the “Behaviour x Situation space”. (And all of this changes over time, as our language and world-views evolve.)[2]
Because the process which determined our usage of the word “cooperation” is quite messy and somewhat random, the boundaries of the resulting concept end up being quite complicated or “fuzzy”.[3] This also means that any mathematical formula that aims to capture this concept needs to be at least complicated enough to explain all the “ad-hoc” parts of the generating process. As a result, any simple definition of “cooperation” is likely be inappropriate at least in some settings.
Implications
I expect that there is much more to be said about the formation of concepts, their fuzziness, and its implications. For now, I will make two comments.
First, fuzziness of concepts seems to be a scale: We have some extremely crisp concepts such as the prime numbers, the law of gravity, or Nash equilibrium. Towards the other end, we have things like ‘all animals that live on Earth’ (if you had to describe them without being able to point to them), natural languages[4], or human values.[5] I expect concepts such as “cooperation”, “intelligence”, or “fairness” to be somewhere in the middle.
Second, I think that it is reasonable to try being more precise about what we mean by cooperation (and many other concepts). More precisely, I think we can come up with concrete definitions—even mathematical definitions—in some specific setting. We can then try to identify a broader class of settings where the definition (a) still “makes formal sense” and (b) still captures the concept we wanted. But we should not be surprised when we encounter settings where the definition is unsuitable—in such settings, we just come up with a different definition, and proceed as before. And by proceeding like this, we gradually cover more and more of the concept’s “area” by crisp definitions.
What to Avoid
Finally, I want to explicitly warn against two things:
(1) The fallacy that “since definition D is perfect in setting S, it is suitable everywhere”. That is, against (a) extending every definition to all setting where it makes formal sense without (b) checking that it captures the intended concept. I think this causes many pointless disagreements.
(2) Even more extremely, replacing the original concept by its formal definition in places where this is inappropriate, possibly even forgetting that the original concept ever existed. (An example of this is that many game theorists get so used to the concept of Nash equilibrium that they start to honestly believe that defecting against one’s identical copy in a prisoner’s dilemma would be the smart thing to do.)
Acknowledgments
This draft was inspired by conversations with the participants at the Cooperative AI Foundation summer retreat (July 2023). So at best, I only deserve a portion of the credit. But it is even quite possible that I heard the full idea from somebody else, then I forgot it, and then I “discovered” it later.
Disclaimers and Qualifications
My reason for writing this is not that I would think that CAIF, or other specific people, are confused about this. Instead, I think that many people already know about this, and the benefit is in making the ideas common knowledge. Also, I don’t think I will ever turn this into a polished and engaging piece of writing. So if you find the ideas useful, feel free to rewrite or appropriate them without consulting me.
Finally, I bet that everything I write here has already been described somewhere 80 years ago, except better and in much more detail.[6] So I don’t mean to imply that this is new — only that this is something that I was confused about before, and that I wish I knew about earlier.
Some weak datapoints in this direction: (1) As far as I know, the Cooperative AI foundation considers finding good “criteria of cooperative behaviour” to be a priority, but they haven’t yet settled on a solution (despite presumably doing a serious literature review, etc). (2) Intuitions definitions tend to fail in edge cases. For example, the definition “cooperation is about maximising joint welfare” would include the scenario where two optimal agents act without ever interacting with each other. Adding the requirement of interaction, we could get actions such as a rich person being forced to redistribute their wealth among the poor, which we could consider altruistic, but probably not cooperative. And we also have cases such as mafia members “cooperating” with each other at the cost of the remainder of society. (3) There might be some examples of behaviour that might be considered “cooperative and expected” in some cultures, but not in others. (Queuing in the UK? The amount of help one is expected to offer to distant family members?) Though note that I haven’t done my lit-review due-dilligence here, so I am not sure if there actually are such differences, if they replicate, etc.
In addition to the origin of concepts being messy in the way that I describe, things can also get complicated because words are ambiguous.
To illustrate what I mean, imagine the following (totally made-up) example: Suppose that the concept of fairness originally appeared in the context of fair-play in football. Somebody then started talking about fairness in the context of family relationships. And finally person A extended football-fairness to the context of business. But at the same time, person B extended family-fairness to the same business context. We now have two different concepts of fairness in the same context, both of which are simply called “fairness”.
In the “fairness” example above, perhaps the resulting concepts end up being very close to each other, or even identical. But in other cases, this process might result in concepts that are different in important ways while still being close enough to cause a lot of confusion. In such cases, the first step should be to explicitly disambiguate the different uses.
EDIT: Note that I am trying to make a non-trivial claim here. For many concepts, I would argue that concept comes first and the word-for-that-concept comes second—this seems true for, eg, lightning or the-headphones-I-am-wearning-now. I am arguing that, to a large extent, cooperation has this the other way around. That is, the concept of [the various things we mean when saying cooperation] is mostly a function of how we use the word, and doesn’t neatly correspond to some thing in the world (or low-complexity concept in the concept space).
I vaguely remember a quote along the lines of: “We were working on machine translation. And over the years, every time we fired a formal linguist and replaced them by an ML engineer, the accuracy went up”. From the point of view described in this post, this seems unsurprising: if human languages are this messy and organic thing, neural networks will have a much easier time with them than rigid formal rules.
The complexity of each concept seems related to the process that gave birth to the concept. For example, the complexity of our-notion-of-cooperation seems related to the number of paths that the evolution of the word “cooperation” could have taken. The complexity of human values—the actual thing we care about in alignment, not the various things that people mean when they use the words “human values”—seems proportional to the number of different paths that natural-evolution of humans could have taken.
Fundamentally Fuzzy Concepts Can’t Have Crisp Definitions: Cooperation and Alignment vs Math and Physics
Epistemic status: I describe an argument that seems plausible, but could also be entirely off. IE, low confidence.
Summary of the Argument
Certain concepts (like “cooperation” or “human values”) might be fundamentally fuzzy. This would have two implications: (1) We should not expect to find crisp mathematical definitions of these concepts. (2) If a crisp mathematical definition seems appropriate in one setting, we should not be surprised when it stops being appropriate in other settings.
Full Version
One of the questions discussed in connection to Cooperative AI is “what is cooperation (and collusion)”. This is important because if we had clear mathematical definitions of these concepts, we could do things like: Incorporate them into loss functions when training AI. Use them to design environments and competitions to evaluate AI. Write regulations that promote cooperation or forbids collusion.
However, cooperation (and many other concepts) might be fundamentally fuzzy, such that we should not expect any such crisp definition to exist. Why might this be the case? First, some evidence for this claim is the observation that we haven’t yet managed to agree on such definition.[1]
Conjecture: Cooperation is Just a Word (ie, not ontologically fundamental)
As a second argument, consider the following story – totally made-up, but hopefully plausible – of how we came to use the word “cooperation”: Over the course of history, people interacted with each other in many different situations. Over time, they developed various practices, norms, and tricks to make those interactions go better. Eventually, somebody pointed to some such practice—in a specific situation—and called it “cooperation”. Then the word started also being applied to several analogous situations and practices. And ultimately, we now use the word “cooperation” for anything that belongs to a certain cluster (or clusters) of points in the “Behaviour x Situation space”. (And all of this changes over time, as our language and world-views evolve.)[2]
Because the process which determined our usage of the word “cooperation” is quite messy and somewhat random, the boundaries of the resulting concept end up being quite complicated or “fuzzy”.[3] This also means that any mathematical formula that aims to capture this concept needs to be at least complicated enough to explain all the “ad-hoc” parts of the generating process. As a result, any simple definition of “cooperation” is likely be inappropriate at least in some settings.
Implications
I expect that there is much more to be said about the formation of concepts, their fuzziness, and its implications. For now, I will make two comments.
First, fuzziness of concepts seems to be a scale: We have some extremely crisp concepts such as the prime numbers, the law of gravity, or Nash equilibrium. Towards the other end, we have things like ‘all animals that live on Earth’ (if you had to describe them without being able to point to them), natural languages[4], or human values.[5] I expect concepts such as “cooperation”, “intelligence”, or “fairness” to be somewhere in the middle.
Second, I think that it is reasonable to try being more precise about what we mean by cooperation (and many other concepts). More precisely, I think we can come up with concrete definitions—even mathematical definitions—in some specific setting. We can then try to identify a broader class of settings where the definition (a) still “makes formal sense” and (b) still captures the concept we wanted. But we should not be surprised when we encounter settings where the definition is unsuitable—in such settings, we just come up with a different definition, and proceed as before. And by proceeding like this, we gradually cover more and more of the concept’s “area” by crisp definitions.
What to Avoid
Finally, I want to explicitly warn against two things:
(1) The fallacy that “since definition D is perfect in setting S, it is suitable everywhere”. That is, against (a) extending every definition to all setting where it makes formal sense without (b) checking that it captures the intended concept. I think this causes many pointless disagreements.
(2) Even more extremely, replacing the original concept by its formal definition in places where this is inappropriate, possibly even forgetting that the original concept ever existed. (An example of this is that many game theorists get so used to the concept of Nash equilibrium that they start to honestly believe that defecting against one’s identical copy in a prisoner’s dilemma would be the smart thing to do.)
Acknowledgments
This draft was inspired by conversations with the participants at the Cooperative AI Foundation summer retreat (July 2023). So at best, I only deserve a portion of the credit. But it is even quite possible that I heard the full idea from somebody else, then I forgot it, and then I “discovered” it later.
Disclaimers and Qualifications
My reason for writing this is not that I would think that CAIF, or other specific people, are confused about this. Instead, I think that many people already know about this, and the benefit is in making the ideas common knowledge.
Also, I don’t think I will ever turn this into a polished and engaging piece of writing. So if you find the ideas useful, feel free to rewrite or appropriate them without consulting me.
Finally, I bet that everything I write here has already been described somewhere 80 years ago, except better and in much more detail.[6] So I don’t mean to imply that this is new — only that this is something that I was confused about before, and that I wish I knew about earlier.
Some weak datapoints in this direction:
(1) As far as I know, the Cooperative AI foundation considers finding good “criteria of cooperative behaviour” to be a priority, but they haven’t yet settled on a solution (despite presumably doing a serious literature review, etc).
(2) Intuitions definitions tend to fail in edge cases. For example, the definition “cooperation is about maximising joint welfare” would include the scenario where two optimal agents act without ever interacting with each other. Adding the requirement of interaction, we could get actions such as a rich person being forced to redistribute their wealth among the poor, which we could consider altruistic, but probably not cooperative. And we also have cases such as mafia members “cooperating” with each other at the cost of the remainder of society.
(3) There might be some examples of behaviour that might be considered “cooperative and expected” in some cultures, but not in others. (Queuing in the UK? The amount of help one is expected to offer to distant family members?) Though note that I haven’t done my lit-review due-dilligence here, so I am not sure if there actually are such differences, if they replicate, etc.
In addition to the origin of concepts being messy in the way that I describe, things can also get complicated because words are ambiguous.
To illustrate what I mean, imagine the following (totally made-up) example: Suppose that the concept of fairness originally appeared in the context of fair-play in football. Somebody then started talking about fairness in the context of family relationships. And finally person A extended football-fairness to the context of business. But at the same time, person B extended family-fairness to the same business context. We now have two different concepts of fairness in the same context, both of which are simply called “fairness”.
In the “fairness” example above, perhaps the resulting concepts end up being very close to each other, or even identical. But in other cases, this process might result in concepts that are different in important ways while still being close enough to cause a lot of confusion. In such cases, the first step should be to explicitly disambiguate the different uses.
EDIT: Note that I am trying to make a non-trivial claim here. For many concepts, I would argue that concept comes first and the word-for-that-concept comes second—this seems true for, eg, lightning or the-headphones-I-am-wearning-now. I am arguing that, to a large extent, cooperation has this the other way around. That is, the concept of [the various things we mean when saying cooperation] is mostly a function of how we use the word, and doesn’t neatly correspond to some thing in the world (or low-complexity concept in the concept space).
I vaguely remember a quote along the lines of: “We were working on machine translation. And over the years, every time we fired a formal linguist and replaced them by an ML engineer, the accuracy went up”. From the point of view described in this post, this seems unsurprising: if human languages are this messy and organic thing, neural networks will have a much easier time with them than rigid formal rules.
The complexity of each concept seems related to the process that gave birth to the concept. For example, the complexity of our-notion-of-cooperation seems related to the number of paths that the evolution of the word “cooperation” could have taken. The complexity of human values—the actual thing we care about in alignment, not the various things that people mean when they use the words “human values”—seems proportional to the number of different paths that natural-evolution of humans could have taken.
My guess: some areas of philosophy, and then somebody doing complexity science at Santa Fe Institute, and then a bunch of other people.