Ok, understood on the second assumption.U is not a function to [0,1], but a function to the set of [0,1]-valued random variables, and your assumption is that this random variable is uncorrelated with certain claims about the outputs of certain policies. The intuitive explanation of the third condition made sense; my complaint was that even with the intended interpretation at hand, the formal statement made no sense to me.
I’m pretty sure you’re assuming that ϕ is resolved on day n, not that it is resolved eventually.
Searching over the set of all Turing machines won’t halt in a reasonably short amount of time, and in fact won’t halt ever, since the set of all Turing machines is non-compact. So I don’t see what you mean when you say that the computation is not extremely long.
Ah, the formal statement was something like “if the policy A isn’t the argmax policy, the successor policy B must be in the policy space of the future argmax, and the action selected by policy A is computed so the relevant equality holds”
Yeah, I am assuming fast feedback that it is resolved on day n .
What I meant was that the computation isn’t extremely long in the sense of description length, not in the sense of computation time. Also, we aren’t doing policy search over the set of all turing machines, we’re doing policy search over some smaller set of policies that can be guaranteed to halt in a reasonable time (and more can be added as time goes on)
Also I’m less confident in conditional future-trust for all conditionals than I used to be, I’ll try to crystallize where I think it goes wrong.
What I meant was that the computation isn’t extremely long in the sense of description length, not in the sense of computation time. Also, we aren’t doing policy search over the set of all turing machines, we’re doing policy search over some smaller set of policies that can be guaranteed to halt in a reasonable time (and more can be added as time goes on)
Wouldn’t the set of all action sequences have lower description length than some large finite set of policies? There’s also the potential problem that all of the policies in the large finite set you’re searching over could be quite far from optimal.
Ok, understood on the second assumption.U is not a function to [0,1], but a function to the set of [0,1]-valued random variables, and your assumption is that this random variable is uncorrelated with certain claims about the outputs of certain policies. The intuitive explanation of the third condition made sense; my complaint was that even with the intended interpretation at hand, the formal statement made no sense to me.
I’m pretty sure you’re assuming that ϕ is resolved on day n, not that it is resolved eventually.
Searching over the set of all Turing machines won’t halt in a reasonably short amount of time, and in fact won’t halt ever, since the set of all Turing machines is non-compact. So I don’t see what you mean when you say that the computation is not extremely long.
Ah, the formal statement was something like “if the policy A isn’t the argmax policy, the successor policy B must be in the policy space of the future argmax, and the action selected by policy A is computed so the relevant equality holds”
Yeah, I am assuming fast feedback that it is resolved on day n .
What I meant was that the computation isn’t extremely long in the sense of description length, not in the sense of computation time. Also, we aren’t doing policy search over the set of all turing machines, we’re doing policy search over some smaller set of policies that can be guaranteed to halt in a reasonable time (and more can be added as time goes on)
Also I’m less confident in conditional future-trust for all conditionals than I used to be, I’ll try to crystallize where I think it goes wrong.
Wouldn’t the set of all action sequences have lower description length than some large finite set of policies? There’s also the potential problem that all of the policies in the large finite set you’re searching over could be quite far from optimal.