That original post lays out UDT1.0, I don’t see anything about precomputing the optimal policy within it. The UDT1.1 fix of optimizing the global policy instead of figuring out the best thing to do on the fly, was first presented here, note that the 1.1 post that I linked came chronologically after the post you linked.
I gave this explanation at the start of the UDT1.1 post:
When describing UDT1 solutions to various sampleproblems, I’ve often talked about UDT1 finding the function S* that would optimize its preferences over the world program P, and then return what S* would return, given its input. But in my original description of UDT1, I never explicitly mentioned optimizing S as a whole, but instead specified UDT1 as, upon receiving input X, finding the optimal output Y* for that input, by considering the logical consequences of choosing various possible outputs. I have been implicitly assuming that the former (optimization of the global strategy) would somehow fall out of the latter (optimization of the local action) without having to be explicitly specified, due to how UDT1 takes into account logical correlations between different instances of itself. But recently I found an apparent counter-example to this assumption.
Ok, I misunderstood. (See also my post on the relation between local and global optimality, and another post on coordinating local decisions using MCMC)
That original post lays out UDT1.0, I don’t see anything about precomputing the optimal policy within it. The UDT1.1 fix of optimizing the global policy instead of figuring out the best thing to do on the fly, was first presented here, note that the 1.1 post that I linked came chronologically after the post you linked.
I gave this explanation at the start of the UDT1.1 post:
Ok, I misunderstood. (See also my post on the relation between local and global optimality, and another post on coordinating local decisions using MCMC)