Do we really need the constraint M[O’] halts with probability 1 for each O’? I remember the results going through anyway, if we just allow O(M) to be arbitrary with whatever probability M[O] doesn’t halt. This actually doesn’t even require Kakutani, we can just use Brouwer’s.
The idea is that P(STATE[O] outputs 1) is a continuous function of P(O[X] outputs 1), P(STATE0[O] outputs 1), and P(STATE1[O] outputs 1), where X is the oracle query that STATE makes in the next time step (if any), STATE0 is the resulting state after the oracle returns 0, and STATE1 is the resulting state after the oracle returns 1. So given one value for O, we can define the new oracle f(O) using this update step. Because f(O)(STATE) depends only on O(X), O(STATE0), and O(STATE1), f is a continuous function. So we get a fixed point.
I guess one reason to dislike this version is that it can’t be used in such a straightforward way as part of AIXI, without some further restriction, such as some upper bound on the computational complexity of the world (perhaps exponentially larger than the computational complexity of the agent).
If this application is the point of the restriction that M[O’] halts with probability 1 for all O’, then it might be reasonable to more explicitly call that out. It seems like a simpler notion of reflective oracles might find use elsewhere. For example, it seems like most practicing computer scientists (at least in the US) don’t much care about the difference between “infinite” and “doubly exponential.” In some contexts (esp. when you have access to a halting oracle) the infinite is important because it is serving as an analogy for the computationally intractable. But I don’t think that’s the case here (because we are considering “doubly exponential relative to O,” which is already intractable for a polynomial time agent relative to O).
Thanks for expanding on your construction! I hadn’t thought of the recursive construction, that’s really neat.
I’m not that worried about the application to AIXI: unless I’m missing something, we can just additionally give our machines access to a double halting oracle (for ordinary Turing machines), and recover the same power. I considered doing that and didn’t go with it because the stuff in my post seemed slightly more elegant, but if there’s a reason to prefer the other version, it seems fine to use it.
I’m not clear on what you mean by “P(O[X] outputs 1)”; I’m guessing you were sloppy in talking about O(┌X┐) rather than my O(┌X┐,p) or some other variant, but I may be missing something. The O(┌X┐,p) version seems to need Kakutani unless we want to allow some ε-tolerance: consider f(O)(┌O⟨┌X┐,p⟩┐,q) for arbitrary (┌X┐,p) and for 0<q<1; we want this to equal 0 for all O such that O(┌X┐,p)>q and to equal 1 for all O such that O(┌X┐,p)<q, which makes f discontinuous if it’s single-valued. My intuition is that we generally need Kakutani to get exact optimality rather than ε optimality (as a rule of thumb, not a hard-and-fast rule), because which choice is optimal jumps discontinuously.
In the O(┌X┐,p) version, continuity is not as immediate as if we could determine f(O)(┌X┐,p) by looking at only three values of O(⋅,⋅). However, we can use the same trick as in my post on simplicity priors: Abbreviate the probability that the oracle will say “greater” or “less” when called on (┌STATE1┐,p) as gp:=O(┌STATE1┐,p) and lp:=(1−gp), respectively; then, the probability that a reflective O assigns to STATE1[O] halting equals
This is clearly a continuous function of O, since later calls to O are multiplied with smaller and smaller weights. Thus, we can use this expression and the analogous one for STATE0 in the definition of f(O).
We can then check that a fixed point O∈f(O) of this correspondence is a reflective oracle by showing inductively that for every T, if the probability that X[O] halts in T steps and outputs 1 is greater than p then O(┌X┐,p)=1 and if the probability that it halts in T steps and outputs 0 is less than p, then O(┌X┐,p)=0.
Tangential: I’m now thinking that it would be more elegant not to speak about halting; rather, use write-and-advance-only output tapes, and talk about the probability that the first bit X outputs is a 1 or a 0, with “never outputs anything” as the third possibility(whether due to looping or halting without output). Interested in whether or not this presentation would seem natural to theoretical computer scientists. (Although still interested in comments on the notation O(⋅,⋅) for the probabilities of the oracle’s outputs.)
Do we really need the constraint M[O’] halts with probability 1 for each O’? I remember the results going through anyway, if we just allow O(M) to be arbitrary with whatever probability M[O] doesn’t halt. This actually doesn’t even require Kakutani, we can just use Brouwer’s.
The idea is that P(STATE[O] outputs 1) is a continuous function of P(O[X] outputs 1), P(STATE0[O] outputs 1), and P(STATE1[O] outputs 1), where X is the oracle query that STATE makes in the next time step (if any), STATE0 is the resulting state after the oracle returns 0, and STATE1 is the resulting state after the oracle returns 1. So given one value for O, we can define the new oracle f(O) using this update step. Because f(O)(STATE) depends only on O(X), O(STATE0), and O(STATE1), f is a continuous function. So we get a fixed point.
I guess one reason to dislike this version is that it can’t be used in such a straightforward way as part of AIXI, without some further restriction, such as some upper bound on the computational complexity of the world (perhaps exponentially larger than the computational complexity of the agent).
If this application is the point of the restriction that M[O’] halts with probability 1 for all O’, then it might be reasonable to more explicitly call that out. It seems like a simpler notion of reflective oracles might find use elsewhere. For example, it seems like most practicing computer scientists (at least in the US) don’t much care about the difference between “infinite” and “doubly exponential.” In some contexts (esp. when you have access to a halting oracle) the infinite is important because it is serving as an analogy for the computationally intractable. But I don’t think that’s the case here (because we are considering “doubly exponential relative to O,” which is already intractable for a polynomial time agent relative to O).
Thanks for expanding on your construction! I hadn’t thought of the recursive construction, that’s really neat.
I’m not that worried about the application to AIXI: unless I’m missing something, we can just additionally give our machines access to a double halting oracle (for ordinary Turing machines), and recover the same power. I considered doing that and didn’t go with it because the stuff in my post seemed slightly more elegant, but if there’s a reason to prefer the other version, it seems fine to use it.
I’m not clear on what you mean by “P(O[X] outputs 1)”; I’m guessing you were sloppy in talking about O(┌X┐) rather than my O(┌X┐,p) or some other variant, but I may be missing something. The O(┌X┐,p) version seems to need Kakutani unless we want to allow some ε-tolerance: consider f(O)(┌O⟨┌X┐,p⟩┐,q) for arbitrary (┌X┐,p) and for 0<q<1; we want this to equal 0 for all O such that O(┌X┐,p)>q and to equal 1 for all O such that O(┌X┐,p)<q, which makes f discontinuous if it’s single-valued. My intuition is that we generally need Kakutani to get exact optimality rather than ε optimality (as a rule of thumb, not a hard-and-fast rule), because which choice is optimal jumps discontinuously.
In the O(┌X┐,p) version, continuity is not as immediate as if we could determine f(O)(┌X┐,p) by looking at only three values of O(⋅,⋅). However, we can use the same trick as in my post on simplicity priors: Abbreviate the probability that the oracle will say “greater” or “less” when called on (┌STATE1┐,p) as gp:=O(┌STATE1┐,p) and lp:=(1−gp), respectively; then, the probability that a reflective O assigns to STATE1[O] halting equals
12g.5 + 14(l.5g.25+g.5g.75) + 18(l.5(l.25g.125+g.25g.375)+g.5(l.75g.625+g.75g.875)) + ⋯.
This is clearly a continuous function of O, since later calls to O are multiplied with smaller and smaller weights. Thus, we can use this expression and the analogous one for STATE0 in the definition of f(O).
We can then check that a fixed point O∈f(O) of this correspondence is a reflective oracle by showing inductively that for every T, if the probability that X[O] halts in T steps and outputs 1 is greater than p then O(┌X┐,p)=1 and if the probability that it halts in T steps and outputs 0 is less than p, then O(┌X┐,p)=0.
Tangential: I’m now thinking that it would be more elegant not to speak about halting; rather, use write-and-advance-only output tapes, and talk about the probability that the first bit X outputs is a 1 or a 0, with “never outputs anything” as the third possibility(whether due to looping or halting without output). Interested in whether or not this presentation would seem natural to theoretical computer scientists. (Although still interested in comments on the notation O(⋅,⋅) for the probabilities of the oracle’s outputs.)