More formally, you have an initial distribution of “weights” on possible universes (in the currently most general case it’s the Solomonoff prior) that you never update at all. In each individual universe you have a utility function over what happens. When you’re faced with a decision, you find all copies of you in the entire “multiverse” that are faced with the same decision (“information set”), and choose the decision that logically implies the maximum sum of resulting utilities weghted by universe-weight.
I’ve read both the original UDT post and this one, and I’m still not sure I understand this basic point. The only way I can make sense out of it is as follows.
The UDT agent is modeled as a procedure S, and its interaction with the universe as a program P calling that procedure and doing something depending on the return value. Some utility is assigned to each such outcome. Now, S knows the prior probability distribution over all programs that might be calling S, and there is also the input X. So when the call S(X) occurs, the procedure S will consider how the expected utility varies depending on what S(X) evaluates to. However, changing the return value for S(X) affects only those terms in the expected utility calculation that correspond to those programs that might (logically) be calling S with input X, so whatever method is used to calculate that maximum, it effectively addresses only those programs. This restriction of the whole set of programs for the given input replaces the Bayesian updating, hence the name “updateless.”
Is this anywhere close to the intended idea, or am I rambling in complete misapprehension? I’d be grateful if someone clarified that for me before I make any additional comments.
I’ve read both the original UDT post and this one, and I’m still not sure I understand this basic point. The only way I can make sense out of it is as follows.
The UDT agent is modeled as a procedure S, and its interaction with the universe as a program P calling that procedure and doing something depending on the return value. Some utility is assigned to each such outcome. Now, S knows the prior probability distribution over all programs that might be calling S, and there is also the input X. So when the call S(X) occurs, the procedure S will consider how the expected utility varies depending on what S(X) evaluates to. However, changing the return value for S(X) affects only those terms in the expected utility calculation that correspond to those programs that might (logically) be calling S with input X, so whatever method is used to calculate that maximum, it effectively addresses only those programs. This restriction of the whole set of programs for the given input replaces the Bayesian updating, hence the name “updateless.”
Is this anywhere close to the intended idea, or am I rambling in complete misapprehension? I’d be grateful if someone clarified that for me before I make any additional comments.
Yeah, it looks to me like you understand it correctly.