(This post was going to be a description of my paper UDT with known search order, but instead it obsoletes that paper.)
Consider the following argument that UDT is the unique decision procedure that performs optimally in simple universes.
Suppose an agent A, with a provability oracle for statements like “if A does action a, then utility u results,” doesn’t play chicken for some possible action a. In other words, the agent is not committed to taking action a whenever it sees that A provably does not take action a. Then this fact would be visible to PA; using Lob’s theorem, we could prove that A actually does not take action a. This argument looks like it is independent of the universe, so in some simple universes where action a would give a large reward, A actually fails to take a. Therefore optimal agents must diagonalize against actions that they provably don’t take.
This post will carry through a precise version of this argument. First we’ll analyze exactly how agents behave when they only have to worry about finitely many possible outcomes. Then we’ll apply this to the case where there are unboundedly many possible outcomes.
The busy reader could read just the description of the formal setting, and then skip to the corollary towards the end. The only crucial thing for the statement of the corollary is the notion of an agent that halts for every “possible” consequences oracle. However, the analysis in the main claim gives more information; namely, it says how to compute the behavior of any halting agent in any universe, as long as that universe is transparent in a sense to be defined later.
Formal setting
A universe U[A] is any algorithm that outputs a utility in the set U⊂N. A universe is parameterized by the choice of an agent A, and it may make calls to some level of halting oracle to compute what A does. The set U of possible utilities may be infinite, and we assume that our preference ordering on U is the ordering < on N. An agent is an algorithm A[U] (usually written A) that selects some action from the finite set A. The universe can observe the behavior of A, but the agent only gains information about U through its oracle, which depends on U.
The agent A makes calls to a consequences oracle $C : A \times U \to {0,1} $, which is always implicitly dependent on A and U.
We’ll work in PA, which can talk about U[A] and A[U] via quining. So, the oracle call C(a,u) returns 1 if PA⊢A[U]=a→U[A]=u, and C(a,u) returns 0 otherwise.
We make a transparency assumption on the universe:
∀a∈A∃u∈U:PA⊢A[U]=a→U[A]=u.
This is a strong assumption, but we are trying to prove that UDT is \emph{required} for optimal behavior in simple universes. We could just as well assume that the universe is a simple map A→U, i.e.U simply checks the action of A and then looks up a corresponding utility in a table. The transparency assumption above is all we need to analyze the behavior of our agents, but we will see that UDT is necessary for optimal behavior even in the simple universes.
For each action a, if C(a,u1)=C(a,u2)=1 for some u1≠u2, then PA⊢A≠a, and in fact C(a,u)=1 for all u∈U.
By the transparency of U, for each a∈A we have either C(a,u)=1 for all u∈U, or else C(a,u)=1 for exactly one u. So, we define a consequence profile for A,U to be any map γ:A→U∪{⊥} such that
∀a∈A,u∈U:γ(a)=u⇒C(a,u)=1.
The interpretation of γ(a)=u is that C(a,u)=1 for u only, and the interpretation of γ(a)=⊥ is that C(a,u)=1 for all u∈U; consequence profiles encode all the information that C might contain. We say that a consequence profile γdescribes an oracle C if for all a∈A, γ(a)=u⇒ (C(a,u)=1 for only u), and γ(a)=⊥⇒ (C(a,u)=1 for all u).
A consequence profile is full if ⊥ is not in its range, i.e. it gives an actual utility for each action. Two consequence profiles γ and δagree if there is no a∈A such that ⊥≠γ(a)≠δ(a)≠⊥; in other words, they agree if they assign the same utility to an action whenever they both assign an actual utility to that action, rather than assigning ⊥.
Given A and U, the transparency assumption lets us pick a (possibly non-unique) full consequence profile, which we will denote by σ.
We require that U[A] halts as long as A halts. We also require that A halts given any consequence profile γ for A and U. Such an agent is called a halting agent.
Main claim describing halting agent behavior
We begin with the case where the algorithm A only ever mentions finitely many utilities. Fix U, and obtain a full consequence profile σ for A and U. It is provable in PA that some γ that agrees with σ describes C, and so γ determines A’s action. So for any such γ, we can write A(γ)=a to mean that A outputs a if it is given an oracle described by γ. By the assumption that A is a halting agent, A(γ) is defined, and by Σ1 completeness, PA can see this fact for any specific γ.
Claim: Define a sequence ⟨γi⟩ of consequence profiles that agree with σ, and a sequence ⟨ai⟩ of actions. First set γ0(a)=⊥ for all a, and set a0=A(γ0). Given γn and an, define γn+1=γn[an↦σ(an)], i.e.γn+1 is γn except that it maps an to σ(an). Define an+1 to be A(γn+1). Then it is the case that A=ak, where k is the least index such that for some i<k, ai=ak. Furthermore, γk describes C for U and A. (The γi and ai do not depend on the choice of σ, but we will use σ in the proof.)
Example
(This section is skippable.)
Before we prove the claim, an example is perhaps in order. Consider a universe U and an agent A with |A|=3, and fix a full consequence profile σ for A and U. What we are looking at here is the strategy that A follows, given the information it might receive from the oracle C.
a2a0∙a0a1∙a1a1ax∙a2∙
(The boxes and bullets are just markers for us to refer to.) The table is meant to be a cube, where the boxed ax (x∈{0,1,2}) is the lower left corner closest to us, and the boxed a0 is the upper right corner farthest from us. Each entry in this cube gives the action A(γ) for some γ that agrees with σ. The i-th dimension corresponds to whether or not γ(ai)=⊥.
For each action a, A might see either just C(a,σ(a))=1, or else C(a,u)=1 for all u. The up-down dimension corresponds to whether or not C(a0,u) holds for all u; the back-front dimension corresponds to whether or not C(a1,u) holds for all u; and the left-right dimension is for a2.
So, the boxed a0 indicates that A will perform action a0 if C returns 1 for every query; the a2 to the left of the boxed a0 indicates that A does a2 if C returns 1 for every query about a0 or a1, and returns 0 for (a2,u) if u≠σ(a2); the boxed ax indicates that A does ax if C returns 0 for (ai,u) if u≠σ(ai), for i∈{0,1,2}; and so on.
Then the claim says that we start at the boxed a0, with γ0≡⊥, and follow the bullets. First γ1(a0)=σ(a0); then γ2(a1)=σ(a1); and finally γ3(a2)=σ(a2) takes us to ax. The conclusion of the claim is that A does ax, since that is the first entry in the bulleted descent that repeats an action. (This is desirable behavior, as we can choose ax to be the ai with the greatest σ(ai). The claim says that in this situation, γ3=σ describes the true oracle C, so this ai will be optimal.)
If we replaced a1∙ with a0∙, the claim says that A does a0, as that is the first repeated action. In this case, we could have reached the conclusion pretty easily: by the previous discussion, PA knows that A does the action dictated by this table. So PA can prove that if $\square_{\textsf{PA}}\ulcorner {\cal A}[{\cal U}] \ne a_1 \wedge {\cal A}[{\cal U}] \ne a_2 \urcorner $, then A does some action in the back right of the cube. Both of those actions are a0, so A does not do a1 or a2. Hence PA⊢□PA┌A[U]∉{a1,a2}┐→A[U]∉{a1,a2},
and by Lob’s theorem, PA⊢A[U]∉{a1,a2}. Running the same argument again, we see that A in fact does a0.
Proof of main claim
Recall the claim: we have a sequence ⟨ai⟩ of actions and a sequence ⟨γi⟩ of consequence profiles that agree with σ. We chose γ0(a)≡⊥, ai=A(γi) for all i, and γn+1=γn[an↦σ(an)]. The claim says that A=ak for the least k such that ai=ak for some i<k, and the oracle C is described by γk. Note that, by definition, γk(ai)=σ(ai) for i<k, and γk(ai)=⊥ for i≥k.
Proof of claim. First we prove, by induction on k starting at zero, the following statement:
∀j<k:PA⊢□┌A[U]≠aj┐→□┌A[U]∈{ai∣i<j}┐.
This is vacuous for k=0. So suppose we have the hypothesis for some k>0. We want to show that
PA⊢□┌A[U]≠ak┐→□┌A[U]∈{ai∣i<k}┐.
Reasoning in PA, suppose that □┌A[U]≠ak┐, and suppose also that □┌A[U]∈{ai∣i<k}┐.
Let j be the smallest number such that □┌A[U]≠aj┐. By our suppositions, j<k. By the inductive hypothesis, then, ∀a∉{ai∣i<j}:□┌A[U]≠a┐.
This tells us everything we need to know about C to deduce what A does; we know that C is described by γj. Therefore the action is A(γj)=aj.
In particular, A[U]∈{ai∣i<k}.
We have shown that
PA⊢□┌A[U]≠ak┐→(□┌A[U]∈{ai∣i<k}┐→A[U]∈{ai∣i<k}).
This argument goes through if everything is “one level deeper,” i.e. wrapped in an additional □ (or, we can apply Σ1 completeness of PA and distributivity of □ over $\to $). This gives
PA⊢□┌A[U]≠ak┐→□┌□┌A[U]∈{ai∣i<k}┐→A[U]∈{ai∣i<k}┐.
By formalized Lob’s theorem, we have the desired conclusion:
PA⊢□┌A[U]≠ak┐→□┌A[U]∈{ai∣i<k}┐.
Now, let k be as in the claim, so it is the least index such that ai=ak for some i<k.
Reasoning again in PA, suppose that □┌A[U]∈{ai∣i<k}┐. Then ∀a∉{ai∣i<k}:□┌A[U]≠a┐. Using the statement just proved and the previous argument about what C can look like, this gives that ∃i≤k:A[U]=A(γi). Since all such actions A(γi) are in {ai∣i<k}, this shows that A[U]∈{ai∣i<k}.
We just proved that
PA⊢□┌A[U]∈{ai∣i<k}┐→A[U]∈{ai∣i<k},
so by Lob’s theorem,
PA⊢A[U]∈{ai∣i<k}.
Running the same argument again, we know that ∃i≤k:A[U]=A(γi), and we know that C is described by γi for some i≤k. But for i<k, γi(ai)=⊥ and A(γi)=ai. So by soundness, no such γi can describe C. Hence γk describes C and A[U]=A(γk)=ak. QED
Corollary on uniqueness of UDT
Let us consider universes with |U| infinite—there are infinitely many possible outcomes. (Each universe only has finitely many accessible outcomes; we are looking at a set of universes that are allowed to have outcomes in N, so the agent will have to ask C about unboundedly many (a,u) pairs.) We restrict to universes that are very simple: they are just maps from A to U that are easy to reason about. Recall that we assumed that agents are algorithms, modulo their oracles. In other words, for any consequence profile, they eventually halt given an oracle described by that profile. Now we give a corollary of the above claim, stating that UDT is unique among halting agents that are optimal for simple universes.
Corollary. Suppose a halting agent A is optimal for all universes that implement a simple function A→U. Then A implements UDT, in the sense that A checks if each action is provably not taken; takes any such action; and otherwise takes the action with the highest provable consequence. The order in which actions are diagonalized against can only depend on the actual consequences of actions that have already been checked. (So there is some action a that A always diagonalizes against; then for each u∈U, there is an action au such that whenever C(a,u)=1, A diagonalizes against au; for each u,v∈U, there is an action auv such that whenever C(a,u)=1 and C(au,v)=1, A diagonalizes against auv; and so on.)
Proof. Fix some agent A with |A|=n. We will show that either A fails on some simple universe U, or else A is UDT in the above sense. Suppose that for some simple U that implements the function f:A→U, the oracle C of A[U] is not full, meaning that for some a∈A, C(a,u)=1 for all u∈U. So k<n, where k is the index given by the main claim such that A[U]=A(γk)=ak.
Notice that the behavior dictated by the claim does not depend on the “true” consequences of all the actions in A−{ai∣i≤k}; we can show that the action ak happens just knowing what A does for γi, i≤k. So let’s modify U to obtain a simple universe U′ implementing f′=f[a↦u′], where a is an action for which PA⊢A[U]≠a, and u′ is some utility larger than any number in the range of f. Now A[U′]≠a, and so A is suboptimal in a simple universe.
This characterizes the behavior of any optimal A: A must play chicken against each action in turn, and then choose the action with the highest unique consequence. Here, “playing chicken” means that A(γi)=ai, even though γi(ai)=⊥.
When choosing which action to diagonalize against next, A must choose only based on the provable consequences of the actions it has already diagonalized against; this is just saying that A(γi) is well-defined. If A does not vary this choice, then there is a fixed enumeration {a0,a1,…,an−1} of A such that A plays chicken against each ai in order, in every universe. This recovers the usual formulation of UDT.
Discussion of assumptions
The assumption that A must halt can be slightly relaxed; it only needs to halt for each γi. However, this might still be too restrictive. It is possible, as far as I know, that there is an agent that doesn’t halt given some of those “possible” oracles, and yet performs optimally on simple universes (because none of those oracles ever actually happen, or when they do the default action is the correct one).
The transparency assumption on the universe doesn’t weaken the conclusion about uniqueness of UDT, but it prevents applying the analysis of oracle-agent behavior to multi-agent situations, which are the most interesting universes. If there are multiple agents, it can be impossible to prove any consequences, because the other agent also has a provability oracle. Then A can’t even prove that the other agent’s oracle won’t always return 1 (because PA is inconsistent), so it can’t in general prove what happens in the universe.
[Tangent: Restricting to an oracle for statements of the form PA⊢A[U]=a→U[A]=u is meant to model an agent that has an oracle for expected utility of actions. Allowing an abstract decision procedure to have unrestricted oracles (or rather, any access to the code for U) makes it difficult (for me) to define UDT in a reasonable way. For example, say there are two agents, A1 and A2, both using UDT. When A1 considers what strategy would be best for UDT to follow, it decides that it would be a great idea if any UDT that finds itself in the shoes of A2 immediately donates all its resources to A1. This agent is delusional. One way to fix this problem is to make a decision procedure be a function of only the expected utility of actions (or some other sort of information mentioning only the relationship between actions and utility.)
The decision procedure should be abstracted from the utility function, so it should be abstracted away from the concrete world model.]
Uniqueness of UDT for transparent universes
Prerequisites: Provability logic, Updateless decision theory
Outline
Introduction
Formal setting
Main claim describing halting agent behavior
Example
Proof of main claim
Corollary on uniqueness of UDT
Discussion of assumptions
Introduction
(This post was going to be a description of my paper UDT with known search order, but instead it obsoletes that paper.)
Consider the following argument that UDT is the unique decision procedure that performs optimally in simple universes.
Suppose an agent A, with a provability oracle for statements like “if A does action a, then utility u results,” doesn’t play chicken for some possible action a. In other words, the agent is not committed to taking action a whenever it sees that A provably does not take action a. Then this fact would be visible to PA; using Lob’s theorem, we could prove that A actually does not take action a. This argument looks like it is independent of the universe, so in some simple universes where action a would give a large reward, A actually fails to take a. Therefore optimal agents must diagonalize against actions that they provably don’t take.
This post will carry through a precise version of this argument. First we’ll analyze exactly how agents behave when they only have to worry about finitely many possible outcomes. Then we’ll apply this to the case where there are unboundedly many possible outcomes.
The busy reader could read just the description of the formal setting, and then skip to the corollary towards the end. The only crucial thing for the statement of the corollary is the notion of an agent that halts for every “possible” consequences oracle. However, the analysis in the main claim gives more information; namely, it says how to compute the behavior of any halting agent in any universe, as long as that universe is transparent in a sense to be defined later.
Formal setting
A universe U[A] is any algorithm that outputs a utility in the set U⊂N. A universe is parameterized by the choice of an agent A, and it may make calls to some level of halting oracle to compute what A does. The set U of possible utilities may be infinite, and we assume that our preference ordering on U is the ordering < on N. An agent is an algorithm A[U] (usually written A) that selects some action from the finite set A. The universe can observe the behavior of A, but the agent only gains information about U through its oracle, which depends on U.
The agent A makes calls to a consequences oracle $C : A \times U \to {0,1} $, which is always implicitly dependent on A and U. We’ll work in PA, which can talk about U[A] and A[U] via quining. So, the oracle call C(a,u) returns 1 if PA⊢A[U]=a→U[A]=u, and C(a,u) returns 0 otherwise.
We make a transparency assumption on the universe: ∀a∈A ∃u∈U:PA⊢A[U]=a→U[A]=u . This is a strong assumption, but we are trying to prove that UDT is \emph{required} for optimal behavior in simple universes. We could just as well assume that the universe is a simple map A→U, i.e.U simply checks the action of A and then looks up a corresponding utility in a table. The transparency assumption above is all we need to analyze the behavior of our agents, but we will see that UDT is necessary for optimal behavior even in the simple universes.
For each action a, if C(a,u1)=C(a,u2)=1 for some u1≠u2, then PA⊢A≠a, and in fact C(a,u)=1 for all u∈U. By the transparency of U, for each a∈A we have either C(a,u)=1 for all u∈U, or else C(a,u)=1 for exactly one u. So, we define a consequence profile for A,U to be any map γ:A→U∪{⊥} such that ∀a∈A,u∈U:γ(a)=u⇒C(a,u)=1 . The interpretation of γ(a)=u is that C(a,u)=1 for u only, and the interpretation of γ(a)=⊥ is that C(a,u)=1 for all u∈U; consequence profiles encode all the information that C might contain. We say that a consequence profile γ describes an oracle C if for all a∈A, γ(a)=u⇒ (C(a,u)=1 for only u), and γ(a)=⊥⇒ (C(a,u)=1 for all u).
A consequence profile is full if ⊥ is not in its range, i.e. it gives an actual utility for each action. Two consequence profiles γ and δ agree if there is no a∈A such that ⊥≠γ(a)≠δ(a)≠⊥; in other words, they agree if they assign the same utility to an action whenever they both assign an actual utility to that action, rather than assigning ⊥. Given A and U, the transparency assumption lets us pick a (possibly non-unique) full consequence profile, which we will denote by σ.
We require that U[A] halts as long as A halts. We also require that A halts given any consequence profile γ for A and U. Such an agent is called a halting agent.
Main claim describing halting agent behavior
We begin with the case where the algorithm A only ever mentions finitely many utilities. Fix U, and obtain a full consequence profile σ for A and U. It is provable in PA that some γ that agrees with σ describes C, and so γ determines A’s action. So for any such γ, we can write A(γ)=a to mean that A outputs a if it is given an oracle described by γ. By the assumption that A is a halting agent, A(γ) is defined, and by Σ1 completeness, PA can see this fact for any specific γ.
Claim: Define a sequence ⟨γi⟩ of consequence profiles that agree with σ, and a sequence ⟨ai⟩ of actions. First set γ0(a)=⊥ for all a, and set a0=A(γ0). Given γn and an, define γn+1=γn[an↦σ(an)], i.e.γn+1 is γn except that it maps an to σ(an). Define an+1 to be A(γn+1). Then it is the case that A=ak, where k is the least index such that for some i<k, ai=ak. Furthermore, γk describes C for U and A. (The γi and ai do not depend on the choice of σ, but we will use σ in the proof.)
Example
(This section is skippable.)
Before we prove the claim, an example is perhaps in order. Consider a universe U and an agent A with |A|=3, and fix a full consequence profile σ for A and U. What we are looking at here is the strategy that A follows, given the information it might receive from the oracle C. a2a0∙a0a1∙a1a1ax∙a2∙ (The boxes and bullets are just markers for us to refer to.) The table is meant to be a cube, where the boxed ax (x∈{0,1,2}) is the lower left corner closest to us, and the boxed a0 is the upper right corner farthest from us. Each entry in this cube gives the action A(γ) for some γ that agrees with σ. The i-th dimension corresponds to whether or not γ(ai)=⊥.
For each action a, A might see either just C(a,σ(a))=1, or else C(a,u)=1 for all u. The up-down dimension corresponds to whether or not C(a0,u) holds for all u; the back-front dimension corresponds to whether or not C(a1,u) holds for all u; and the left-right dimension is for a2. So, the boxed a0 indicates that A will perform action a0 if C returns 1 for every query; the a2 to the left of the boxed a0 indicates that A does a2 if C returns 1 for every query about a0 or a1, and returns 0 for (a2,u) if u≠σ(a2); the boxed ax indicates that A does ax if C returns 0 for (ai,u) if u≠σ(ai), for i∈{0,1,2}; and so on.
Then the claim says that we start at the boxed a0, with γ0≡⊥, and follow the bullets. First γ1(a0)=σ(a0); then γ2(a1)=σ(a1); and finally γ3(a2)=σ(a2) takes us to ax. The conclusion of the claim is that A does ax, since that is the first entry in the bulleted descent that repeats an action. (This is desirable behavior, as we can choose ax to be the ai with the greatest σ(ai). The claim says that in this situation, γ3=σ describes the true oracle C, so this ai will be optimal.)
If we replaced a1∙ with a0∙, the claim says that A does a0, as that is the first repeated action. In this case, we could have reached the conclusion pretty easily: by the previous discussion, PA knows that A does the action dictated by this table. So PA can prove that if $\square_{\textsf{PA}}\ulcorner {\cal A}[{\cal U}] \ne a_1 \wedge {\cal A}[{\cal U}] \ne a_2 \urcorner $, then A does some action in the back right of the cube. Both of those actions are a0, so A does not do a1 or a2. Hence PA⊢□PA┌A[U]∉{a1,a2}┐→A[U]∉{a1,a2} , and by Lob’s theorem, PA⊢A[U]∉{a1,a2}. Running the same argument again, we see that A in fact does a0.
Proof of main claim
Recall the claim: we have a sequence ⟨ai⟩ of actions and a sequence ⟨γi⟩ of consequence profiles that agree with σ. We chose γ0(a)≡⊥, ai=A(γi) for all i, and γn+1=γn[an↦σ(an)]. The claim says that A=ak for the least k such that ai=ak for some i<k, and the oracle C is described by γk. Note that, by definition, γk(ai)=σ(ai) for i<k, and γk(ai)=⊥ for i≥k.
Proof of claim. First we prove, by induction on k starting at zero, the following statement: ∀j<k:PA⊢□┌A[U]≠aj┐→□┌A[U]∈{ai∣i<j}┐ . This is vacuous for k=0. So suppose we have the hypothesis for some k>0. We want to show that PA⊢□┌A[U]≠ak┐→□┌A[U]∈{ai∣i<k}┐ . Reasoning in PA, suppose that □┌A[U]≠ak┐, and suppose also that □┌A[U]∈{ai∣i<k}┐.
Let j be the smallest number such that □┌A[U]≠aj┐. By our suppositions, j<k. By the inductive hypothesis, then, ∀a∉{ai∣i<j}:□┌A[U]≠a┐. This tells us everything we need to know about C to deduce what A does; we know that C is described by γj. Therefore the action is A(γj)=aj. In particular, A[U]∈{ai∣i<k}.
We have shown that PA⊢□┌A[U]≠ak┐→(□┌A[U]∈{ai∣i<k}┐→A[U]∈{ai∣i<k}) . This argument goes through if everything is “one level deeper,” i.e. wrapped in an additional □ (or, we can apply Σ1 completeness of PA and distributivity of □ over $\to $). This gives PA⊢□┌A[U]≠ak┐→□┌□┌A[U]∈{ai∣i<k}┐→A[U]∈{ai∣i<k}┐ . By formalized Lob’s theorem, we have the desired conclusion: PA⊢□┌A[U]≠ak┐→□┌A[U]∈{ai∣i<k}┐ . Now, let k be as in the claim, so it is the least index such that ai=ak for some i<k. Reasoning again in PA, suppose that □┌A[U]∈{ai∣i<k}┐. Then ∀a∉{ai∣i<k}:□┌A[U]≠a┐. Using the statement just proved and the previous argument about what C can look like, this gives that ∃i≤k:A[U]=A(γi). Since all such actions A(γi) are in {ai∣i<k}, this shows that A[U]∈{ai∣i<k}.
We just proved that PA⊢□┌A[U]∈{ai∣i<k}┐→A[U]∈{ai∣i<k} , so by Lob’s theorem, PA⊢A[U]∈{ai∣i<k} . Running the same argument again, we know that ∃i≤k:A[U]=A(γi), and we know that C is described by γi for some i≤k. But for i<k, γi(ai)=⊥ and A(γi)=ai. So by soundness, no such γi can describe C. Hence γk describes C and A[U]=A(γk)=ak. QED
Corollary on uniqueness of UDT
Let us consider universes with |U| infinite—there are infinitely many possible outcomes. (Each universe only has finitely many accessible outcomes; we are looking at a set of universes that are allowed to have outcomes in N, so the agent will have to ask C about unboundedly many (a,u) pairs.) We restrict to universes that are very simple: they are just maps from A to U that are easy to reason about. Recall that we assumed that agents are algorithms, modulo their oracles. In other words, for any consequence profile, they eventually halt given an oracle described by that profile. Now we give a corollary of the above claim, stating that UDT is unique among halting agents that are optimal for simple universes.
Corollary. Suppose a halting agent A is optimal for all universes that implement a simple function A→U. Then A implements UDT, in the sense that A checks if each action is provably not taken; takes any such action; and otherwise takes the action with the highest provable consequence. The order in which actions are diagonalized against can only depend on the actual consequences of actions that have already been checked. (So there is some action a that A always diagonalizes against; then for each u∈U, there is an action au such that whenever C(a,u)=1, A diagonalizes against au; for each u,v∈U, there is an action auv such that whenever C(a,u)=1 and C(au,v)=1, A diagonalizes against auv; and so on.)
Proof. Fix some agent A with |A|=n. We will show that either A fails on some simple universe U, or else A is UDT in the above sense. Suppose that for some simple U that implements the function f:A→U, the oracle C of A[U] is not full, meaning that for some a∈A, C(a,u)=1 for all u∈U. So k<n, where k is the index given by the main claim such that A[U]=A(γk)=ak.
Notice that the behavior dictated by the claim does not depend on the “true” consequences of all the actions in A−{ai∣i≤k}; we can show that the action ak happens just knowing what A does for γi, i≤k. So let’s modify U to obtain a simple universe U′ implementing f′=f[a↦u′], where a is an action for which PA⊢A[U]≠a, and u′ is some utility larger than any number in the range of f. Now A[U′]≠a, and so A is suboptimal in a simple universe.
This characterizes the behavior of any optimal A: A must play chicken against each action in turn, and then choose the action with the highest unique consequence. Here, “playing chicken” means that A(γi)=ai, even though γi(ai)=⊥.
When choosing which action to diagonalize against next, A must choose only based on the provable consequences of the actions it has already diagonalized against; this is just saying that A(γi) is well-defined. If A does not vary this choice, then there is a fixed enumeration {a0,a1,…,an−1} of A such that A plays chicken against each ai in order, in every universe. This recovers the usual formulation of UDT.
Discussion of assumptions
The assumption that A must halt can be slightly relaxed; it only needs to halt for each γi. However, this might still be too restrictive. It is possible, as far as I know, that there is an agent that doesn’t halt given some of those “possible” oracles, and yet performs optimally on simple universes (because none of those oracles ever actually happen, or when they do the default action is the correct one).
The transparency assumption on the universe doesn’t weaken the conclusion about uniqueness of UDT, but it prevents applying the analysis of oracle-agent behavior to multi-agent situations, which are the most interesting universes. If there are multiple agents, it can be impossible to prove any consequences, because the other agent also has a provability oracle. Then A can’t even prove that the other agent’s oracle won’t always return 1 (because PA is inconsistent), so it can’t in general prove what happens in the universe.
[Tangent: Restricting to an oracle for statements of the form PA⊢A[U]=a→U[A]=u is meant to model an agent that has an oracle for expected utility of actions. Allowing an abstract decision procedure to have unrestricted oracles (or rather, any access to the code for U) makes it difficult (for me) to define UDT in a reasonable way. For example, say there are two agents, A1 and A2, both using UDT. When A1 considers what strategy would be best for UDT to follow, it decides that it would be a great idea if any UDT that finds itself in the shoes of A2 immediately donates all its resources to A1. This agent is delusional. One way to fix this problem is to make a decision procedure be a function of only the expected utility of actions (or some other sort of information mentioning only the relationship between actions and utility.) The decision procedure should be abstracted from the utility function, so it should be abstracted away from the concrete world model.]