Aagh… I’m conflicted. I really like the simplicity of having players correspond 1:1 to calls to the oracle, and their mixed strategies to the probability that the oracle returns “true”, but the implication that the oracle must return the same answer if it’s called twice on the same arguments during the execution of a single program interacts very badly with the intuitive picture of the applications I have in mind.
The intuitive picture is that the agents under consideration are living in a world whose laws of physics are probabilistic and allow for the construction of oracles—meaning, devices that invoke the underlying reflective oracle. An agent would be just a computer with access to such a device. Then, according to CDT, it’s optimal to program these computers to do oracle calls that compare the expected utilities of taking the different actions available to the agent, and then taking an action that maximizes expected utility. (I.e., it’s not that that’s the only way you could program that computer—you could program it to use its oracle in many different ways, but this one is an optimal one according to causal reasoning.) The world, then, can be modelled to be an oracle machine program which simulates the local laws of physics and outputs what happens in the world.
But for this picture to be correct, it’s really important for different calls to the oracle to be independent. Suppose that they aren’t, and we have two players playing a game of Matching Pennies. One of these players chooses according the expected utility calculation described above, using a single oracle invocation to figure out which action is better. But the other player… makes the same call to the oracle as the first player, uses this to determine whether the first player chooses heads or tails, and does the opposite. Clearly, the second player wins every game, meaning that the algorithm run by the first player is no longer optimal.
So I think the version you outlined doesn’t work for my main application, even though I really like the simplicity of it.
One possibility would be to have a player for every instance where the oracle is invoked in one of the possible execution histories of one of the programs under consideration. But then it wouldn’t be guaranteed that the different players corresponding to the same pair (┌M┐,p) will play the same mixed strategies. Of course, it can be shown that there are Nash equilibria that are symmetric in this way, but requiring such a result seems to take away from the simplicity of the connection to Nash equilibria, and it’s not immediate that the algorithms for finding Nash equilibria can be adapted to finding this kind of symmetric equilibrium (although they almost certainly can).
Another possibility would be to try to formulate this as a game where players correspond to pairs (┌M┐,p), but where their pure strategies are numbers in the interval [0,1]. In this case, if the players’ payoffs are a continuous bounded function of the strategy profile, Kakutani’s fixed point theorem will guarantee that there is a pure Nash equilibrium, and it should be possible to find this result in a standard textbook somewhere (economists need continuous strategy spaces for modelling things like auction bids). And there might be existing algorithms for finding Nash equilibria for such games. Judd, Renner & Schmedders (2012) looks promising, although I haven’t really delved into it. (It also looks annoyingly complicated, though.)
For the payoffs, we could then re-use your idea: If player (┌M┐,p) chooses strategy x∈[0,1], their payoff is x⋅p plus (1−x) times the probability that M[O′] returns 1, where O′ behaves with probabilities according to the pure strategy profile under consideration.
Although, now I’m worried whether or not it’s a problem that when we evaluate whether a player would do better by switching from x to x′, we also switch the oracle to a different oracle that makes calls to (┌M┐,p) return “true” with probability x′ rather than with probability x? (In other words, when we consider replacing x by x′, we’re also potentially changing the probability that M[O′] halts...)
Hmm...this seems like less of a problem to me. The thing we need for the Nash equilibrium equivalence is that, within an execution of a single program, equal calls to the oracle return equal results (or equivalently, you can’t give the same call to the oracle twice). But we don’t need to give the same call to the oracle twice in order to model the matching pennies game, because you just need one call (to determine whether your opponent plays heads with greater than 50% probability), and the two players are represented as different programs.
The thing we’re basing the Nash equilibrium equivalence on is the question “given fixed return values of calls, what is the probability that this program returns 1?” in order to write in utility values for our normal form game. Since we’re basing everything on this process, it follows that within an execution of a program, it only make sense to make a given call once. But nothing about this rules out having different programs reason about the same call, and having the oracle give them different answers.
Aagh… I’m conflicted. I really like the simplicity of having players correspond 1:1 to calls to the oracle, and their mixed strategies to the probability that the oracle returns “true”, but the implication that the oracle must return the same answer if it’s called twice on the same arguments during the execution of a single program interacts very badly with the intuitive picture of the applications I have in mind.
The intuitive picture is that the agents under consideration are living in a world whose laws of physics are probabilistic and allow for the construction of oracles—meaning, devices that invoke the underlying reflective oracle. An agent would be just a computer with access to such a device. Then, according to CDT, it’s optimal to program these computers to do oracle calls that compare the expected utilities of taking the different actions available to the agent, and then taking an action that maximizes expected utility. (I.e., it’s not that that’s the only way you could program that computer—you could program it to use its oracle in many different ways, but this one is an optimal one according to causal reasoning.) The world, then, can be modelled to be an oracle machine program which simulates the local laws of physics and outputs what happens in the world.
But for this picture to be correct, it’s really important for different calls to the oracle to be independent. Suppose that they aren’t, and we have two players playing a game of Matching Pennies. One of these players chooses according the expected utility calculation described above, using a single oracle invocation to figure out which action is better. But the other player… makes the same call to the oracle as the first player, uses this to determine whether the first player chooses heads or tails, and does the opposite. Clearly, the second player wins every game, meaning that the algorithm run by the first player is no longer optimal.
So I think the version you outlined doesn’t work for my main application, even though I really like the simplicity of it.
One possibility would be to have a player for every instance where the oracle is invoked in one of the possible execution histories of one of the programs under consideration. But then it wouldn’t be guaranteed that the different players corresponding to the same pair (┌M┐,p) will play the same mixed strategies. Of course, it can be shown that there are Nash equilibria that are symmetric in this way, but requiring such a result seems to take away from the simplicity of the connection to Nash equilibria, and it’s not immediate that the algorithms for finding Nash equilibria can be adapted to finding this kind of symmetric equilibrium (although they almost certainly can).
Another possibility would be to try to formulate this as a game where players correspond to pairs (┌M┐,p), but where their pure strategies are numbers in the interval [0,1]. In this case, if the players’ payoffs are a continuous bounded function of the strategy profile, Kakutani’s fixed point theorem will guarantee that there is a pure Nash equilibrium, and it should be possible to find this result in a standard textbook somewhere (economists need continuous strategy spaces for modelling things like auction bids). And there might be existing algorithms for finding Nash equilibria for such games. Judd, Renner & Schmedders (2012) looks promising, although I haven’t really delved into it. (It also looks annoyingly complicated, though.)
For the payoffs, we could then re-use your idea: If player (┌M┐,p) chooses strategy x∈[0,1], their payoff is x⋅p plus (1−x) times the probability that M[O′] returns 1, where O′ behaves with probabilities according to the pure strategy profile under consideration.
Although, now I’m worried whether or not it’s a problem that when we evaluate whether a player would do better by switching from x to x′, we also switch the oracle to a different oracle that makes calls to (┌M┐,p) return “true” with probability x′ rather than with probability x? (In other words, when we consider replacing x by x′, we’re also potentially changing the probability that M[O′] halts...)
Hmm...this seems like less of a problem to me. The thing we need for the Nash equilibrium equivalence is that, within an execution of a single program, equal calls to the oracle return equal results (or equivalently, you can’t give the same call to the oracle twice). But we don’t need to give the same call to the oracle twice in order to model the matching pennies game, because you just need one call (to determine whether your opponent plays heads with greater than 50% probability), and the two players are represented as different programs.
The thing we’re basing the Nash equilibrium equivalence on is the question “given fixed return values of calls, what is the probability that this program returns 1?” in order to write in utility values for our normal form game. Since we’re basing everything on this process, it follows that within an execution of a program, it only make sense to make a given call once. But nothing about this rules out having different programs reason about the same call, and having the oracle give them different answers.