Thanks! I’ve been also thinking about making a finite version; my main motivation was that I’ve been hoping to take my still-incompletely-specified AIXI variant and find an analog variant of AIXItl, the computable version of AIXI that considers only hypotheses which terminate in time t and whose source length is <l.
I think that would in fact be a pretty reasonable finite set of programs to use in what you’re suggesting, since “we want it to terminate in time t” provides a motivation for why we would require programs to terminate surely, as opposed to almost surely. (Terminating surely seems to be necessary for the condition of “only finitely many possible calls to the oracle” to hold: there is an almost surely terminating stochastic oracle program which chooses a random pair in N×P and calls the oracle on that.)
It seems like the finite, AIXItl version should be able to implement any finite game with a rational payoff matrix (for sufficiently large t and l), which is pretty neat. (Of course, even though this is computable, it does require the game being played to be much simpler than the computers the players are running on—which are spending basically all of their computation time on computing the oracle, rather than the agent code that calls that oracle; in any real life situation, it’s pretty clear that there would be a better use that these agents could put their supercomputers to.)
I hadn’t realized the point about the problem being expressible using just elementary operations and the set-valued step function—that’s pretty neat! I don’t know enough about polynomial solving to figure out the details of how this would work, but it’s extremely plausible that a variation of a technique used for finding Nash equilibria would work. (In fact, I wonder whether the problem can somehow be transformed into the problem of finding a Nash equilibrium of an appropriate finite game?) I’d be interested in seeing some more details worked out. It seems like some proof that equilibria in the finite case can actually be found would be nice to have in an exposition aimed at computer scientists. (One brute-force proof idea that I’d considered before would be just to say that by the fixed point theorem, there is some solution, so if you try rational-valued solutions in a fine enough grid, you have to find values that are within any ε of being a solution, in some appropriate sense. I haven’t worked out the details.)
I thought about it some more and realized that this is equivalent to finding a Nash equilibrium. What we do is, we create a player for every call (M,p). This player has 2 actions (call them action 0 and action 1). The player’s utilities are set up so that they always get p utility for choosing action 0. Also, given fixed actions for the other players, their utility for action 1 is set to be the probability that M[O′] returns 1 given that O′ is an oracle that returns the action of a different player corresponding to the call it receives. Now, the player’s expected utility is p for action 0, and for action 1 it will be M[O′′]‘s probability of returning 1 where O′′ is an oracle that returns answers to calls under the same distribution as the other players’ mixed strategies. So, it will be allowed to choose action 0 iff P(M,O)<=p, action 1 iff P(M,O)>=p, and will be allowed to mix iff P(M,O)=p, as desired. It should be noted that within a program, repeated calls to the oracle must return the same value, so they should be cached.
Going the other direction is easy: to find the Nash equilibrium for a game where each player has 2 actions, set up a program for each player that queries the oracle for the actions of each other player and then returns the action that maximizes expected utility. For games with more than 2 actions (say, each player has actions 1...k), we can create a different set of programs for each player: program i where i∈{1..k−1} returns 1 iff some maximum expected utility action’s index is less that or equal to i.
It seems to follow that the full problem is equivalent to a game with countably infinite players. I was thinking of how you would solve the full problem using some kind of oracle machine. It would at least be necessary to determine whether a program halts for every infinite input you give it, because this is necessary to detect queries. This interacts weirdly with the requirement that queries almost surely halt; it may require something analogous to #P for counting the number of inputs a program halts on? The problem of determining if a machine halts for every finite input is computable by a double oracle machine but not a regular oracle machine, but I don’t know what that says about infinite inputs.
About determining whether something’s a query: A formulation of the statement we want to test is, (A) “for every (infinite) input on the oracle tape, and every ε>0, there is a T∈N such that with probability >(1−ε), the program halts in ≤T steps.”
I think this is equivalent to, (B) “For every ε>0, there is a T∈N such that, for any input on the oracle tape, the program halts within T timesteps with probability >(1−ε).”
Reason why I think this is equivalent: Clearly (B) implies (A). Suppose that (B) is false; then there is a ε>0 such that for all T>0, there is an input on the oracle tape such that the program runs for >T steps with probability ≥ε. Let f(T) be a function that returns such an oracle tape input for every T.
Now, there either must be infinitely many T such that f(T) starts with 0, or infinitely many T such that f(T) starts with 1, or both. Write down such a bit; say, 1. Now, there must be infinitely many T such that f(T) starts with 10, or such that f(T) starts with 11, or both. Iterate; this way, we obtain an infinitely long input for the oracle tape, which has the property that for every T∈N, there are arbitrarily large—T′and in particular, there’s a—T′>Tsuch that the program runs for >T′ steps with probability ≥ε. In other words, for every T, the probability that the program runs for >T steps when given this input on its oracle tape is ≥ε. This shows that (A) is false.
Thus, it’s sufficient to test whether a prospective query satisfies (B). But in T timesteps, the machine can only read the first T bits of its oracle tape and of its random input tape, so the statement “for any input on the oracle tape, the program halts within T timesteps with probability >(1−ε)” is primitive recursive. Thus, (B) is a Π2 statement, which we can test with a double halting oracle.
(warning: math above written quickly and not carefully checked)
In the other direction, suppose that φ(m,n) is a primitive recursive predicate, and consider the following probabilistic Turing machine: First, randomly choose an m∈N (placing positive probability on every natural number). Then, search exhaustively for an n∈N such that φ(m,n) is true. If you find one, halt and output 1. This machine is a query if and only if ∃m.∀n.φ(m,n). Together with the parent comment, this shows that having an oracle that tells you whether or not something is a query is equivalent to having an oracle for Π2 statements (i.e., a double halting oracle).
Aagh… I’m conflicted. I really like the simplicity of having players correspond 1:1 to calls to the oracle, and their mixed strategies to the probability that the oracle returns “true”, but the implication that the oracle must return the same answer if it’s called twice on the same arguments during the execution of a single program interacts very badly with the intuitive picture of the applications I have in mind.
The intuitive picture is that the agents under consideration are living in a world whose laws of physics are probabilistic and allow for the construction of oracles—meaning, devices that invoke the underlying reflective oracle. An agent would be just a computer with access to such a device. Then, according to CDT, it’s optimal to program these computers to do oracle calls that compare the expected utilities of taking the different actions available to the agent, and then taking an action that maximizes expected utility. (I.e., it’s not that that’s the only way you could program that computer—you could program it to use its oracle in many different ways, but this one is an optimal one according to causal reasoning.) The world, then, can be modelled to be an oracle machine program which simulates the local laws of physics and outputs what happens in the world.
But for this picture to be correct, it’s really important for different calls to the oracle to be independent. Suppose that they aren’t, and we have two players playing a game of Matching Pennies. One of these players chooses according the expected utility calculation described above, using a single oracle invocation to figure out which action is better. But the other player… makes the same call to the oracle as the first player, uses this to determine whether the first player chooses heads or tails, and does the opposite. Clearly, the second player wins every game, meaning that the algorithm run by the first player is no longer optimal.
So I think the version you outlined doesn’t work for my main application, even though I really like the simplicity of it.
One possibility would be to have a player for every instance where the oracle is invoked in one of the possible execution histories of one of the programs under consideration. But then it wouldn’t be guaranteed that the different players corresponding to the same pair (┌M┐,p) will play the same mixed strategies. Of course, it can be shown that there are Nash equilibria that are symmetric in this way, but requiring such a result seems to take away from the simplicity of the connection to Nash equilibria, and it’s not immediate that the algorithms for finding Nash equilibria can be adapted to finding this kind of symmetric equilibrium (although they almost certainly can).
Another possibility would be to try to formulate this as a game where players correspond to pairs (┌M┐,p), but where their pure strategies are numbers in the interval [0,1]. In this case, if the players’ payoffs are a continuous bounded function of the strategy profile, Kakutani’s fixed point theorem will guarantee that there is a pure Nash equilibrium, and it should be possible to find this result in a standard textbook somewhere (economists need continuous strategy spaces for modelling things like auction bids). And there might be existing algorithms for finding Nash equilibria for such games. Judd, Renner & Schmedders (2012) looks promising, although I haven’t really delved into it. (It also looks annoyingly complicated, though.)
For the payoffs, we could then re-use your idea: If player (┌M┐,p) chooses strategy x∈[0,1], their payoff is x⋅p plus (1−x) times the probability that M[O′] returns 1, where O′ behaves with probabilities according to the pure strategy profile under consideration.
Although, now I’m worried whether or not it’s a problem that when we evaluate whether a player would do better by switching from x to x′, we also switch the oracle to a different oracle that makes calls to (┌M┐,p) return “true” with probability x′ rather than with probability x? (In other words, when we consider replacing x by x′, we’re also potentially changing the probability that M[O′] halts...)
Hmm...this seems like less of a problem to me. The thing we need for the Nash equilibrium equivalence is that, within an execution of a single program, equal calls to the oracle return equal results (or equivalently, you can’t give the same call to the oracle twice). But we don’t need to give the same call to the oracle twice in order to model the matching pennies game, because you just need one call (to determine whether your opponent plays heads with greater than 50% probability), and the two players are represented as different programs.
The thing we’re basing the Nash equilibrium equivalence on is the question “given fixed return values of calls, what is the probability that this program returns 1?” in order to write in utility values for our normal form game. Since we’re basing everything on this process, it follows that within an execution of a program, it only make sense to make a given call once. But nothing about this rules out having different programs reason about the same call, and having the oracle give them different answers.
Thanks! I’ve been also thinking about making a finite version; my main motivation was that I’ve been hoping to take my still-incompletely-specified AIXI variant and find an analog variant of AIXItl, the computable version of AIXI that considers only hypotheses which terminate in time t and whose source length is <l.
I think that would in fact be a pretty reasonable finite set of programs to use in what you’re suggesting, since “we want it to terminate in time t” provides a motivation for why we would require programs to terminate surely, as opposed to almost surely. (Terminating surely seems to be necessary for the condition of “only finitely many possible calls to the oracle” to hold: there is an almost surely terminating stochastic oracle program which chooses a random pair in N×P and calls the oracle on that.)
It seems like the finite, AIXItl version should be able to implement any finite game with a rational payoff matrix (for sufficiently large t and l), which is pretty neat. (Of course, even though this is computable, it does require the game being played to be much simpler than the computers the players are running on—which are spending basically all of their computation time on computing the oracle, rather than the agent code that calls that oracle; in any real life situation, it’s pretty clear that there would be a better use that these agents could put their supercomputers to.)
I hadn’t realized the point about the problem being expressible using just elementary operations and the set-valued step function—that’s pretty neat! I don’t know enough about polynomial solving to figure out the details of how this would work, but it’s extremely plausible that a variation of a technique used for finding Nash equilibria would work. (In fact, I wonder whether the problem can somehow be transformed into the problem of finding a Nash equilibrium of an appropriate finite game?) I’d be interested in seeing some more details worked out. It seems like some proof that equilibria in the finite case can actually be found would be nice to have in an exposition aimed at computer scientists. (One brute-force proof idea that I’d considered before would be just to say that by the fixed point theorem, there is some solution, so if you try rational-valued solutions in a fine enough grid, you have to find values that are within any ε of being a solution, in some appropriate sense. I haven’t worked out the details.)
I thought about it some more and realized that this is equivalent to finding a Nash equilibrium. What we do is, we create a player for every call (M,p). This player has 2 actions (call them action 0 and action 1). The player’s utilities are set up so that they always get p utility for choosing action 0. Also, given fixed actions for the other players, their utility for action 1 is set to be the probability that M[O′] returns 1 given that O′ is an oracle that returns the action of a different player corresponding to the call it receives. Now, the player’s expected utility is p for action 0, and for action 1 it will be M[O′′]‘s probability of returning 1 where O′′ is an oracle that returns answers to calls under the same distribution as the other players’ mixed strategies. So, it will be allowed to choose action 0 iff P(M,O)<=p, action 1 iff P(M,O)>=p, and will be allowed to mix iff P(M,O)=p, as desired. It should be noted that within a program, repeated calls to the oracle must return the same value, so they should be cached.
Going the other direction is easy: to find the Nash equilibrium for a game where each player has 2 actions, set up a program for each player that queries the oracle for the actions of each other player and then returns the action that maximizes expected utility. For games with more than 2 actions (say, each player has actions 1...k), we can create a different set of programs for each player: program i where i∈{1..k−1} returns 1 iff some maximum expected utility action’s index is less that or equal to i.
It seems to follow that the full problem is equivalent to a game with countably infinite players. I was thinking of how you would solve the full problem using some kind of oracle machine. It would at least be necessary to determine whether a program halts for every infinite input you give it, because this is necessary to detect queries. This interacts weirdly with the requirement that queries almost surely halt; it may require something analogous to #P for counting the number of inputs a program halts on? The problem of determining if a machine halts for every finite input is computable by a double oracle machine but not a regular oracle machine, but I don’t know what that says about infinite inputs.
About determining whether something’s a query: A formulation of the statement we want to test is, (A) “for every (infinite) input on the oracle tape, and every ε>0, there is a T∈N such that with probability >(1−ε), the program halts in ≤T steps.”
I think this is equivalent to, (B) “For every ε>0, there is a T∈N such that, for any input on the oracle tape, the program halts within T timesteps with probability >(1−ε).”
Reason why I think this is equivalent: Clearly (B) implies (A). Suppose that (B) is false; then there is a ε>0 such that for all T>0, there is an input on the oracle tape such that the program runs for >T steps with probability ≥ε. Let f(T) be a function that returns such an oracle tape input for every T.
Now, there either must be infinitely many T such that f(T) starts with 0, or infinitely many T such that f(T) starts with 1, or both. Write down such a bit; say, 1. Now, there must be infinitely many T such that f(T) starts with 10, or such that f(T) starts with 11, or both. Iterate; this way, we obtain an infinitely long input for the oracle tape, which has the property that for every T∈N, there are arbitrarily large—T′and in particular, there’s a—T′>Tsuch that the program runs for >T′ steps with probability ≥ε. In other words, for every T, the probability that the program runs for >T steps when given this input on its oracle tape is ≥ε. This shows that (A) is false.
Thus, it’s sufficient to test whether a prospective query satisfies (B). But in T timesteps, the machine can only read the first T bits of its oracle tape and of its random input tape, so the statement “for any input on the oracle tape, the program halts within T timesteps with probability >(1−ε)” is primitive recursive. Thus, (B) is a Π2 statement, which we can test with a double halting oracle.
(warning: math above written quickly and not carefully checked)
In the other direction, suppose that φ(m,n) is a primitive recursive predicate, and consider the following probabilistic Turing machine: First, randomly choose an m∈N (placing positive probability on every natural number). Then, search exhaustively for an n∈N such that φ(m,n) is true. If you find one, halt and output 1. This machine is a query if and only if ∃m.∀n.φ(m,n). Together with the parent comment, this shows that having an oracle that tells you whether or not something is a query is equivalent to having an oracle for Π2 statements (i.e., a double halting oracle).
Aagh… I’m conflicted. I really like the simplicity of having players correspond 1:1 to calls to the oracle, and their mixed strategies to the probability that the oracle returns “true”, but the implication that the oracle must return the same answer if it’s called twice on the same arguments during the execution of a single program interacts very badly with the intuitive picture of the applications I have in mind.
The intuitive picture is that the agents under consideration are living in a world whose laws of physics are probabilistic and allow for the construction of oracles—meaning, devices that invoke the underlying reflective oracle. An agent would be just a computer with access to such a device. Then, according to CDT, it’s optimal to program these computers to do oracle calls that compare the expected utilities of taking the different actions available to the agent, and then taking an action that maximizes expected utility. (I.e., it’s not that that’s the only way you could program that computer—you could program it to use its oracle in many different ways, but this one is an optimal one according to causal reasoning.) The world, then, can be modelled to be an oracle machine program which simulates the local laws of physics and outputs what happens in the world.
But for this picture to be correct, it’s really important for different calls to the oracle to be independent. Suppose that they aren’t, and we have two players playing a game of Matching Pennies. One of these players chooses according the expected utility calculation described above, using a single oracle invocation to figure out which action is better. But the other player… makes the same call to the oracle as the first player, uses this to determine whether the first player chooses heads or tails, and does the opposite. Clearly, the second player wins every game, meaning that the algorithm run by the first player is no longer optimal.
So I think the version you outlined doesn’t work for my main application, even though I really like the simplicity of it.
One possibility would be to have a player for every instance where the oracle is invoked in one of the possible execution histories of one of the programs under consideration. But then it wouldn’t be guaranteed that the different players corresponding to the same pair (┌M┐,p) will play the same mixed strategies. Of course, it can be shown that there are Nash equilibria that are symmetric in this way, but requiring such a result seems to take away from the simplicity of the connection to Nash equilibria, and it’s not immediate that the algorithms for finding Nash equilibria can be adapted to finding this kind of symmetric equilibrium (although they almost certainly can).
Another possibility would be to try to formulate this as a game where players correspond to pairs (┌M┐,p), but where their pure strategies are numbers in the interval [0,1]. In this case, if the players’ payoffs are a continuous bounded function of the strategy profile, Kakutani’s fixed point theorem will guarantee that there is a pure Nash equilibrium, and it should be possible to find this result in a standard textbook somewhere (economists need continuous strategy spaces for modelling things like auction bids). And there might be existing algorithms for finding Nash equilibria for such games. Judd, Renner & Schmedders (2012) looks promising, although I haven’t really delved into it. (It also looks annoyingly complicated, though.)
For the payoffs, we could then re-use your idea: If player (┌M┐,p) chooses strategy x∈[0,1], their payoff is x⋅p plus (1−x) times the probability that M[O′] returns 1, where O′ behaves with probabilities according to the pure strategy profile under consideration.
Although, now I’m worried whether or not it’s a problem that when we evaluate whether a player would do better by switching from x to x′, we also switch the oracle to a different oracle that makes calls to (┌M┐,p) return “true” with probability x′ rather than with probability x? (In other words, when we consider replacing x by x′, we’re also potentially changing the probability that M[O′] halts...)
Hmm...this seems like less of a problem to me. The thing we need for the Nash equilibrium equivalence is that, within an execution of a single program, equal calls to the oracle return equal results (or equivalently, you can’t give the same call to the oracle twice). But we don’t need to give the same call to the oracle twice in order to model the matching pennies game, because you just need one call (to determine whether your opponent plays heads with greater than 50% probability), and the two players are represented as different programs.
The thing we’re basing the Nash equilibrium equivalence on is the question “given fixed return values of calls, what is the probability that this program returns 1?” in order to write in utility values for our normal form game. Since we’re basing everything on this process, it follows that within an execution of a program, it only make sense to make a given call once. But nothing about this rules out having different programs reason about the same call, and having the oracle give them different answers.