Summary: we might expect PA+m to be able to predict the behavior of an escalating modal UDT agent using PA+n for n<m. However, this is not the case. Furthermore, in environments where PA+m predicts the agent, escalating modal UDT can be outperformed by a constant agent.
For a given logic X, we can define X+1=X+Con(X) (i.e.X+1 has all the axioms of X plus the axiom that X is consistent). We can iteratively get X+2=(X+1)+1 and so on.
Now consider PA+100. In some sense, it is “more powerful” than PA: it contains additional correct axioms.
Therefore, we might find it plausible that PA+100 can always correctly predict the behavior of a modal UDT agent using PA.
Ordinary modal UDT can run into issues on some environments due to later stages not be able to use the assumption that logics used in previous stages are consistent. Therefore, define escalating modal UDT to be the same as modal UDT except that the nth proof search uses PA+n instead of PA. Benja might write a post explaining this system better.
Let m be a number higher than the number of proof searches escalating modal UDT makes. Patrick made the conjecture that if U is a simple function of A() (the agent’s output) and PA+m⊢A()=a (for each action a, whether the predictor predicts the agent takes action a), where A() is escalating modal UDT, then:
The predictor is correct (if A()=a then PA+m⊢A()=a)
The agent picks the best action conditioned on the predictor being correct (i.e. the action a that maximizes utility when A()=a and PA+m⊢A()=a)
Both parts of the conjecture turn out to be false. First, consider the following environment:
def U():
if A()=α
if PA+100⊢A()=β
return 10
else
return 0
else
return 5
First, the agent will determine PA⊢A()=α→U()=10. For this to be the case PA would have to prove □PA+100┌A()=β┐. But if PA proves this, then in fact A()=α, so this would require PA+100 to be unsound. Therefore this proof search fails.
Next, the agent will determine PA+1⊢A()=β→U()=10. This obviously fails.
Next, the agent will determine PA+2⊢A()=α→U()=5. This obviously fails.
Next, the agent will determine PA+3⊢A()=β→U()=5. This obviously succeeds, and the agent takes action β.
Now consider whether PA+100⊢A()=β. Consider a model M of PA+100 in which
M⊨PA+100M⊨¬Con(PA+100)
Since M⊨PA+100, we have PA+100⊬A()=β. This disproves the first part of the conjecture.
For the second part, consider the following environment:
def U():
if A()=α
if PA+100⊢A()=β
return 0
else
return 10
else
return 5
First, the agent will determine PA⊢A()=α→U()=10. This is equivalent to PA⊢¬□PA+100┌A()=β┐. But PA cannot prove PA+100 consistent, so PA fails to prove this.
Next, the agent will determine PA+1⊢A()=β→U()=10. This obviously fails.
Next, the agent will determine PA+2⊢A()=α→U()=5. This obviously fails.
Next, the agent will determine PA+3⊢A()=β→U()=5. This obviously succeeds, and the agent takes action β.
So, part 2 of the conjecture is false. Escalating modal UDT is outperformed by the agent that always takes action α.
To create predictors that can always predict modal UDT, we may consider having the predictor using PA+Π1 (i.e. PA plus all true Π1 statements, where Π1 is part of the arithmetic hierarchy).
PA+100 cannot always predict modal UDT
Summary: we might expect PA+m to be able to predict the behavior of an escalating modal UDT agent using PA+n for n<m. However, this is not the case. Furthermore, in environments where PA+m predicts the agent, escalating modal UDT can be outperformed by a constant agent.
For a given logic X, we can define X+1=X+Con(X) (i.e.X+1 has all the axioms of X plus the axiom that X is consistent). We can iteratively get X+2=(X+1)+1 and so on.
Now consider PA+100. In some sense, it is “more powerful” than PA: it contains additional correct axioms. Therefore, we might find it plausible that PA+100 can always correctly predict the behavior of a modal UDT agent using PA.
Ordinary modal UDT can run into issues on some environments due to later stages not be able to use the assumption that logics used in previous stages are consistent. Therefore, define escalating modal UDT to be the same as modal UDT except that the nth proof search uses PA+n instead of PA. Benja might write a post explaining this system better.
Let m be a number higher than the number of proof searches escalating modal UDT makes. Patrick made the conjecture that if U is a simple function of A() (the agent’s output) and PA+m⊢A()=a (for each action a, whether the predictor predicts the agent takes action a), where A() is escalating modal UDT, then:
The predictor is correct (if A()=a then PA+m⊢A()=a)
The agent picks the best action conditioned on the predictor being correct (i.e. the action a that maximizes utility when A()=a and PA+m⊢A()=a)
Both parts of the conjecture turn out to be false. First, consider the following environment:
def U():
First, the agent will determine PA⊢A()=α→U()=10. For this to be the case PA would have to prove □PA+100┌A()=β┐. But if PA proves this, then in fact A()=α, so this would require PA+100 to be unsound. Therefore this proof search fails.
Next, the agent will determine PA+1⊢A()=β→U()=10. This obviously fails.
Next, the agent will determine PA+2⊢A()=α→U()=5. This obviously fails.
Next, the agent will determine PA+3⊢A()=β→U()=5. This obviously succeeds, and the agent takes action β.
Now consider whether PA+100⊢A()=β. Consider a model M of PA+100 in which M⊨PA+100 M⊨¬Con(PA+100)
Now: M⊨□PA┌¬Con(PA+100)┐ M⊨□PA┌□PA+100┌A()=β┌┐ M⊨□PA┌A()=α→U()=10┐ M⊨A()=α
Since M⊨PA+100, we have PA+100⊬A()=β. This disproves the first part of the conjecture.
For the second part, consider the following environment:
def U():
First, the agent will determine PA⊢A()=α→U()=10. This is equivalent to PA⊢¬□PA+100┌A()=β┐. But PA cannot prove PA+100 consistent, so PA fails to prove this.
Next, the agent will determine PA+1⊢A()=β→U()=10. This obviously fails.
Next, the agent will determine PA+2⊢A()=α→U()=5. This obviously fails.
Next, the agent will determine PA+3⊢A()=β→U()=5. This obviously succeeds, and the agent takes action β.
So, part 2 of the conjecture is false. Escalating modal UDT is outperformed by the agent that always takes action α.
To create predictors that can always predict modal UDT, we may consider having the predictor using PA+Π1 (i.e. PA plus all true Π1 statements, where Π1 is part of the arithmetic hierarchy).