Wei Dai comments on Immanuel Kant and the Decision Theory App Store

Wei Dai 9 Jul 2023 10:30 UTC
4 points
2

Whereas OpenBot mostly copied the code of the AlphaBot, but has a subroutine that notices when it is giving advice to two people on opposite sides of a PD, and advises them both to cooperate.

If this was all that OpenBot did, it would create an incentive among users to choose OpenBot but then not follow its advice in PD, in other words to Defect instead of Cooperate. To get around this, OpenBot has to also predict whether both users are likely to follow its advice and only advise them to both Cooperate if the probability is high enough (with the threshold depending on the actual payoffs). If OpenBot is sufficiently good at this prediction, then users have an incentive to follow its advice, and everything works out.

Back in the real world though, I think this is the biggest obstacle to cooperating in one-shot PD, namely, I’m not very good at telling what decision theory someone else is really using. I can’t just look at their past behavior, because by definition anything I’ve observed in the past can’t be one-shot PD, and even CDT would recommend Cooperate in many other forms of PD. (And there’s an additional twist compared to the app store model, in that I also have to predict the other player’s prediction of me, their prediction of my prediction, and so on, making mutual cooperation even harder to achieve.)

Did Kant talk about anything like this? I would be a lot more impressed with his philosophy if he did, but I would guess that he probably didn’t.
- Daniel Kokotajlo 10 Jul 2023 3:08 UTC
  2 points
  0
  Parent
  Great comment.
  
  I haven’t read much Kant, so I can’t say what he’d say.
  
  Yeah, it was a mistake for me to set things up such that you can take the advice or leave it, and then also describe OpenBot that way. I should either have OpenBot be the more sophisticated thing you describe, or else say that people have to follow advice once it is given, and have the choice of whether or not to ask for the advice. (Maybe we could operationalize this as, people have delegated most of the important decisions in their life to these apps, and they can only have one app in charge at any given time, and in between important decisions they can choose to uninstall an app but during the decision they can’t.)
  
  Anyhow back to the substantive issue.
  
  Yes, in the real world for humans you have to be worried about various flavors of irrationality and rationality; even if someone seems fairly similar to you you can’t assume that they are following a relevantly similar decision algorithm.
  
  However, Evidential Cooperation in Large Worlds still applies, I think, and matters. Previously I wrote another rationalist reconstruction of Kant that basically explores this.
  
  Moreover, in the real world for AGIs, it may be a lot more possible to have reasonably high credence that someone else is following a relevantly similar decision algorithm—for example they might be a copy of you with a different prompt, or a different fine-tune, or maybe a different pre-training seed, or maybe just the same general architecture (e.g. some specific version of MCTS).