Richard_Ngo comments on Three reasons to cooperate

Richard_Ngo 23 May 2023 20:53 UTC
9 points
5
I don’t think I understand the principled difference between correlation and reciprocity; the latter seems like a subset of the former. Let me try say some things and see where you disagree. This is super messy and probably doesn’t make sense, sorry.
1. There are many factors which could increase the correlation between two agents’ decisions. For agents that are running reciprocity-like policies, the predictions they make about other agents are a particularly big factor.
2. In picking out reciprocity as a separate phenomenon, you seem to be saying “we can factorize the correlation into two parts: the correlation that would arise if we weren’t predicting each other’s decisions, and then the additional correlation that arises from us predicting each other’s decisions”.*
3. But I don’t think that “predicting each other’s decisions” constitute a clear-cut category. For example, maybe I partly have kindness genes because my ancestors kept running into people that are kind for roughly the same reasons as you might be kind (e.g. facts about how evolution works and how neural networks work). So in some sense, being kind counts as a prediction about your decision. Or more directly: I have a bunch of cached heuristics about how to interact with other agents, that I’ll draw on when trying to make my decision, which implicitly involves predicting that you’ll be like other agents.
4. Maybe instead the factorization you’re using is “fix a time T, everything that I knew before T is background correlation, and all the thinking I do after time T can count as part of reciprocity”. This seems like a reasonable factorization but again it doesn’t seem like it’s very clear-cut, and I’m not actually sure that’s what you’re doing.
Maybe this is still the most useful factorization regardless. But here’s a tentative guess at what a more principled way of thinking about this might look like:
1. Identify the different components of my decision-making algorithm, like:
  1. Evolutionarily-ingrained drives and instinctive decision theories
  2. Heuristics shaped by my experiences
  3. More cerebral conclusions based on doing high-level reasoning (e.g. “reciprocity good”)
  4. Predictions about what you’ll do given your predictions of me
  5. Predictions about what you’ll do given your predictions of what I’ll do given my predictions of what you’ll do
  6. Etc...
  7. My current best-guess decision
2. Search for modifications to each of these components which have the property such that, in the nearest worlds where this modification were true, the overall outcome would be better by my current standards (both via influencing my traits and decisions and via influencing your traits and decisions). E.g. if I had evolved to be kinder, then maybe you would have evolved to be kinder too; if I had decided on a different decision theory maybe you would have too; etc.
3. Use my modified decision-making algorithm to make a decision.
Ofc our lack of a good way of reasoning about “nearest worlds” means that only pretty small changes will in practice be justifiable.

* In fact you could think of this as infinite regress: there’s some level-0 correlation before we predict each other, then there’s some level-1 correlation given our predictions of each other’s predictions of the level-0 correlation, and so on. But that doesn’t seem important here.
- paulfchristiano 24 May 2023 1:00 UTC
  7 points
  5
  Parent
  I think it was confusing for me to use “correlation” to refer to a particular source of correlation. I probably should have called it something like “similarity.” But I think the distinction is very real and very important, and crisp enough to be a natural category.
  More precisely, I think that:
  Alice and Bob are correlated because Alice is similar to Bob (produced by similar process, running similar algorithm, downstream of the same basic truths about the universe...)
  is qualitatively and crucially different from:
  Alice and Bob are correlated because Alice is more likely to cooperate if Bob cooperates (so Alice is correlated with her model of Bob, which she constructed to be similar to Bob)
  I don’t think either one is a subset of the other. I don’t think these are an exhaustive taxonomy of reasons that two people can be correlated, but I think they are the two most important ones.
  For example, maybe I partly have kindness genes because my ancestors kept running into people that are kind for roughly the same reasons as you might be kind (e.g. facts about how evolution works and how neural networks work). So in some sense, being kind counts as a prediction about your decision. Or more directly: I have a bunch of cached heuristics about how to interact with other agents, that I’ll draw on when trying to make my decision, which implicitly involves predicting that you’ll be like other agents.
  On its own I don’t see why this would lead me to be kind (if I generally deal with kind people, why does that mean I should be kind?) I think you have to fill in the remaining details somehow, e.g.: maybe I dealt with people who are kind if and only if X is true, and so I have learned to be kind when X is true.
  In my taxonomy this is a central example of reciprocity—the correlation flows through a pressure for me to make predictions about when you will be kind, and then be kind when I think that you will be kind, rather than from us using similar procedures to make decisions. I don’t think I would call any version of this story “correlation” (the concept I should have called “similarity”).