Thanks for the encouragement! Although if people started solving problems in a similar vein and wrote them up, that would be nicer still. :-)
The NTU case doesn’t look clear-cut at all, thanks to the PDFs you gave me; most likely there will be multiple Pareto-optimal equilibria with no clear way to tell which is “better”. The general case of incomplete information also looks bewilderingly hard; I don’t even understand what “equilibrium” means in this context. If I work something out, I will surely post it.
Also very curious about Nesov’s approach, but he has told me mutiple times that there’s nothing to write up mathematically yet.
The general case of incomplete information also looks bewilderingly hard; I don’t even understand what “equilibrium” means in this context.
The Bayesian Equilibrium is usually used in non-cooperative games with incomplete information. (I’m not sure if that’s what you’re asking.)
BTW, I think another way to formulate this problem, which should be equivalent but might be easier to think about, is instead of the AIs knowing each other’s source code and detecting friends and enemies, all the AIs just agree to merge into a single AI running the fair decision algorithm. I had some very preliminary ideas on this which I wrote about on SL4, which I now recognize as trying to reinvent parts of cooperative game theory.
Also very curious about Nesov’s approach, but he has told me mutiple times that there’s nothing to write up mathematically yet.
What he wrote before led me to look into cooperative game theory, so I’m very curious about his current approach, even if the results aren’t ready yet.
If you’re thinking about alternate formulations, it might interest you that programs that know each other’s source code can use the quining trick to emulate any mutually-agreed rule for resolving conflicts, e.g. “play a game of chess to decide who gets the money”. (Quining ensures that both players’ internal representations of other players move the chess pieces correctly.) More generally, they can agree on any common algorithm that takes input strings supplied by individual players (e.g. chess-playing strategies). I guess this counts as “merging”? Another application is generating common cryptographically secure random numbers. This stuff is almost too powerful.
The set of things you can do with “knowing each other’s source code” seems to be the same as what you can do with “agreeing to merge together”. And the underlying mechanisms necessary for implementing them seem to be the same as well.
In order for two AIs meeting for the first time to prove their source code to each other, they each need to reconstruct itself from scratch while being monitored by the other. Similarly, merging is jointly constructing a new entity and then transferring all resources to it. In both cases the necessary enabling technology might be called secure joint construction. (I just made that phrase up. Is there’s a better term?)
It would help if you gave an example of two simple programs that undergo “secure joint construction” in some formal setting and without any source-code equality tricks. For example, assume that initially they both implement algorithms that want to agree but this fact is Rice-unknowable. This stuff isn’t obvious to me. It would be progress.
Well, AI 1 sends a proposal for a joint decision algorithm to AI 2, then AI 2 sends a counter-proposal and they bargain. They do this until they agree on the joint decision algorithm. They then jointly build a new AI and each monitors the construction process to ensure that the new AI really implements the joint decision algorithm that they agreed on. Finally they each transfer all resources to the new AI and shut down.
Does that answer your question, or were you asking something else?
I was asking for a rigorous model of that: some controlled tournament setting that you could implement in Python today, and two small programs you could implement in Python today that would do what you mean. Surely the hard AI issues shouldn’t pose a problem because you can always hardcode any “understanding” that goes on, e.g. “this here incoming packet says that I should do X”. At least that’s my direction of inquiry, because I’m too scared of making hard-to-notice mistakes when handwaving about such things.
In case anyone is wondering what happened to this conversation, cousin_it and I took it offline. I’ll try to write up a summary of that and my current thoughts, but in the mean time here’s what I wrote as the initial reply:
Ok, I’ll try. Assume there is a “joint secure construction service” which takes as input a string from each player, and constructs one machine for each distinct string that it receives, using that string as its program. Each player then has the option to transfer its “resources” to the constructed machine, in which case the machine will then play all of the player’s moves. Then the machines and any players who chose to not transfer “resources” will play the base game (which let’s say is PD).
The Nash equilibrium for this game should be for all players to choose a common program and then choose to transfer resources. That program plays “cooperate” if every did that, otherwise it plays “defect”.
Thanks for the encouragement! Although if people started solving problems in a similar vein and wrote them up, that would be nicer still. :-)
The NTU case doesn’t look clear-cut at all, thanks to the PDFs you gave me; most likely there will be multiple Pareto-optimal equilibria with no clear way to tell which is “better”. The general case of incomplete information also looks bewilderingly hard; I don’t even understand what “equilibrium” means in this context. If I work something out, I will surely post it.
Also very curious about Nesov’s approach, but he has told me mutiple times that there’s nothing to write up mathematically yet.
The Bayesian Equilibrium is usually used in non-cooperative games with incomplete information. (I’m not sure if that’s what you’re asking.)
BTW, I think another way to formulate this problem, which should be equivalent but might be easier to think about, is instead of the AIs knowing each other’s source code and detecting friends and enemies, all the AIs just agree to merge into a single AI running the fair decision algorithm. I had some very preliminary ideas on this which I wrote about on SL4, which I now recognize as trying to reinvent parts of cooperative game theory.
What he wrote before led me to look into cooperative game theory, so I’m very curious about his current approach, even if the results aren’t ready yet.
Ah, you mean this. Okay.
If you’re thinking about alternate formulations, it might interest you that programs that know each other’s source code can use the quining trick to emulate any mutually-agreed rule for resolving conflicts, e.g. “play a game of chess to decide who gets the money”. (Quining ensures that both players’ internal representations of other players move the chess pieces correctly.) More generally, they can agree on any common algorithm that takes input strings supplied by individual players (e.g. chess-playing strategies). I guess this counts as “merging”? Another application is generating common cryptographically secure random numbers. This stuff is almost too powerful.
The set of things you can do with “knowing each other’s source code” seems to be the same as what you can do with “agreeing to merge together”. And the underlying mechanisms necessary for implementing them seem to be the same as well.
In order for two AIs meeting for the first time to prove their source code to each other, they each need to reconstruct itself from scratch while being monitored by the other. Similarly, merging is jointly constructing a new entity and then transferring all resources to it. In both cases the necessary enabling technology might be called secure joint construction. (I just made that phrase up. Is there’s a better term?)
It would help if you gave an example of two simple programs that undergo “secure joint construction” in some formal setting and without any source-code equality tricks. For example, assume that initially they both implement algorithms that want to agree but this fact is Rice-unknowable. This stuff isn’t obvious to me. It would be progress.
Well, AI 1 sends a proposal for a joint decision algorithm to AI 2, then AI 2 sends a counter-proposal and they bargain. They do this until they agree on the joint decision algorithm. They then jointly build a new AI and each monitors the construction process to ensure that the new AI really implements the joint decision algorithm that they agreed on. Finally they each transfer all resources to the new AI and shut down.
Does that answer your question, or were you asking something else?
I was asking for a rigorous model of that: some controlled tournament setting that you could implement in Python today, and two small programs you could implement in Python today that would do what you mean. Surely the hard AI issues shouldn’t pose a problem because you can always hardcode any “understanding” that goes on, e.g. “this here incoming packet says that I should do X”. At least that’s my direction of inquiry, because I’m too scared of making hard-to-notice mistakes when handwaving about such things.
In case anyone is wondering what happened to this conversation, cousin_it and I took it offline. I’ll try to write up a summary of that and my current thoughts, but in the mean time here’s what I wrote as the initial reply:
Ok, I’ll try. Assume there is a “joint secure construction service” which takes as input a string from each player, and constructs one machine for each distinct string that it receives, using that string as its program. Each player then has the option to transfer its “resources” to the constructed machine, in which case the machine will then play all of the player’s moves. Then the machines and any players who chose to not transfer “resources” will play the base game (which let’s say is PD).
The Nash equilibrium for this game should be for all players to choose a common program and then choose to transfer resources. That program plays “cooperate” if every did that, otherwise it plays “defect”.