Paul commented on this post and suggested a stronger version: For every modal decision theory →T(→u) and every provably extensional modal decision problem →P(→a), modal UDT will do at least as well on →P(→a) as →T(→u) does if it is using a proof system that can prove what action →T(→u) chooses on this decision problem, and which outcome it obtains as a result. In this post, I give a detailed proof of this.
Let (→A(T),→U(T)) be the fixed point of →T(→u) and →P(→a), and recall my notation →χ(i) for the sequence of formulas which has ⊤ as the i’th entry, and ⊥ as all of its other entries (with the length of the sequence being clear from context). Now suppose that φ is a true closed formula in the language of GL such that
GL⊢φ→((→A(T)↔→χ(i∗T))∧(→U(T)↔→χ(j∗T)));
that is, GL+φ proves that →T(→u) chooses action i∗T, and achieves the outcome j∗T as a result. (By saying that φ is a “true” formula we mean that its translation to the language of arithmetic is true about the standard natural numbers: N⊨φ.) It’s ok to talk about →A(T) and →U(T) in the definition of i∗T and j∗T because fixed points are unique (up to provable equivalence), and GL can prove that the fixed point is in fact a fixed point.
My claim, then, is that →UDT(φ)(→u) will perform at least as well as →T(→u) on the decision problem →P(→a); that is, the outcome it achieves will be ranked ≤j∗T.
Intuitively, this is straight-forward.→UDT(φ)(→u) searches through all pairs (j,i) of outcomes and actions in lexicographical order, until it finds a pair such that it can prove that if it takes action i, it will achieve outcome j; as soon as it finds such a pair, it takes action i. (This is justified becaues it searches outcomes best-first, so it takes an action that leads to as good an outcome as it’s able to prove it can get.) So if it can prove that taking action i∗T will lead to outcome j∗T, it will either take that action and get that outcome, or there’s some pair (j,i)<(j∗T,i∗T) such that it takes action i and obtains outcome j≤j∗T. (Remember that outcomes are numbered from best to worst.)
Let’s go through the details of showing that it actually works out that way.
Let’s write (→A,→U) for the fixed point of →UDT(φ)(→u) with →P(→a). The part in our argument that we need to check carefully is that UDT will in fact stop at the pair (j∗T,i∗T) if it hasn’t already stopped before that; i.e.,
GL⊢φ→(UDT(φ)i∗T(→U)→Uj∗T).
If this is satisfied, then we’re done: We know that there will be some pair (j∗,i∗)≤(j∗T,i∗T) such that →UDT(φ)(→U) outputs i∗ and such that
GL⊢φ→(UDT(φ)i∗(→U)→Uj∗),
and hence, since (a) GL is sound on N, (b) N⊨φ, and (c) by assumption, N⊨UDT(φ)i∗, it follows that N⊨Uj∗, i.e., UDT(φ)(→U) achieves the outcome j∗≤j∗T. Thus, let’s check that GL does indeed prove φ→(UDT(φ)i∗T(→U)→Uj∗T).
To do so, we make use of provable extensionality, that is, of the fact that
GL⊢(→a↔→b)→(→P(→a)↔→P(→b)).
Since as a modal decision theory, →UDT(φ) is a p.m.e.e. sequence (provably mutually exclusive and exhaustive), GL proves that UDT(φ)i∗T(→U) implies →UDT(φ)(→U)↔→χ(i∗T), i.e., →A↔→χ(i∗T). Hence, together with provable extensionality, we obtain
GL⊢UDT(φ)i∗T(→U)→(→U↔→P(χ(i∗T)))
(since GL⊢→P(→A)↔→U by definition of →U). But on the other hand, recall our initial assumption that
GL⊢φ→((→A(T)↔→χ(i∗T))∧(→U(T)↔→χ(j∗T)));
again by provable extensionality, this implies
GL⊢φ→((→P(→A(T))↔→P(→χ(i∗T)))∧(→U(T)↔→χ(j∗T))),
and since GL⊢→P(→A(T))↔→U(T) by definition of →U(T), this simplifies to
GL⊢φ→(P(→χ(i∗T))↔→χ(j∗T)).
But together with our earlier result, this implies
GL⊢(φ∧UDT(φ)i∗T(→U))→(→U↔→χ(j∗T)),
which is equivalent to
GL⊢φ→(UDT(φ)i∗T(→U)→Uj∗T),
as desired.
Improving the modal UDT optimality result
I recently posted about an optimality result for modal UDT, which shows that for every modal decision problem →P(→a), there is a closed modal formula φ such that the version of modal UDT that searches for proofs in PA+φ will perform optimally on →P(→a).
Paul commented on this post and suggested a stronger version: For every modal decision theory →T(→u) and every provably extensional modal decision problem →P(→a), modal UDT will do at least as well on →P(→a) as →T(→u) does if it is using a proof system that can prove what action →T(→u) chooses on this decision problem, and which outcome it obtains as a result. In this post, I give a detailed proof of this.
Prerequisite: An optimality result for modal UDT, and the prerequisites therein.
Let (→A(T),→U(T)) be the fixed point of →T(→u) and →P(→a), and recall my notation →χ(i) for the sequence of formulas which has ⊤ as the i’th entry, and ⊥ as all of its other entries (with the length of the sequence being clear from context). Now suppose that φ is a true closed formula in the language of GL such that GL⊢φ→((→A(T)↔→χ(i∗T))∧(→U(T)↔→χ(j∗T))); that is, GL+φ proves that →T(→u) chooses action i∗T, and achieves the outcome j∗T as a result. (By saying that φ is a “true” formula we mean that its translation to the language of arithmetic is true about the standard natural numbers: N⊨φ.) It’s ok to talk about →A(T) and →U(T) in the definition of i∗T and j∗T because fixed points are unique (up to provable equivalence), and GL can prove that the fixed point is in fact a fixed point.
My claim, then, is that →UDT(φ)(→u) will perform at least as well as →T(→u) on the decision problem →P(→a); that is, the outcome it achieves will be ranked ≤j∗T.
Intuitively, this is straight-forward.→UDT(φ)(→u) searches through all pairs (j,i) of outcomes and actions in lexicographical order, until it finds a pair such that it can prove that if it takes action i, it will achieve outcome j; as soon as it finds such a pair, it takes action i. (This is justified becaues it searches outcomes best-first, so it takes an action that leads to as good an outcome as it’s able to prove it can get.) So if it can prove that taking action i∗T will lead to outcome j∗T, it will either take that action and get that outcome, or there’s some pair (j,i)<(j∗T,i∗T) such that it takes action i and obtains outcome j≤j∗T. (Remember that outcomes are numbered from best to worst.)
Let’s go through the details of showing that it actually works out that way.
Let’s write (→A,→U) for the fixed point of →UDT(φ)(→u) with →P(→a). The part in our argument that we need to check carefully is that UDT will in fact stop at the pair (j∗T,i∗T) if it hasn’t already stopped before that; i.e., GL⊢φ→(UDT(φ)i∗T(→U)→Uj∗T). If this is satisfied, then we’re done: We know that there will be some pair (j∗,i∗)≤(j∗T,i∗T) such that →UDT(φ)(→U) outputs i∗ and such that GL⊢φ→(UDT(φ)i∗(→U)→Uj∗), and hence, since (a) GL is sound on N, (b) N⊨φ, and (c) by assumption, N⊨UDT(φ)i∗, it follows that N⊨Uj∗, i.e., UDT(φ)(→U) achieves the outcome j∗≤j∗T. Thus, let’s check that GL does indeed prove φ→(UDT(φ)i∗T(→U)→Uj∗T).
To do so, we make use of provable extensionality, that is, of the fact that GL⊢(→a↔→b)→(→P(→a)↔→P(→b)). Since as a modal decision theory, →UDT(φ) is a p.m.e.e. sequence (provably mutually exclusive and exhaustive), GL proves that UDT(φ)i∗T(→U) implies →UDT(φ)(→U)↔→χ(i∗T), i.e., →A↔→χ(i∗T). Hence, together with provable extensionality, we obtain GL⊢UDT(φ)i∗T(→U)→(→U↔→P(χ(i∗T))) (since GL⊢→P(→A)↔→U by definition of →U). But on the other hand, recall our initial assumption that GL⊢φ→((→A(T)↔→χ(i∗T))∧(→U(T)↔→χ(j∗T))); again by provable extensionality, this implies GL⊢φ→((→P(→A(T))↔→P(→χ(i∗T)))∧(→U(T)↔→χ(j∗T))), and since GL⊢→P(→A(T))↔→U(T) by definition of →U(T), this simplifies to GL⊢φ→(P(→χ(i∗T))↔→χ(j∗T)). But together with our earlier result, this implies GL⊢(φ∧UDT(φ)i∗T(→U))→(→U↔→χ(j∗T)), which is equivalent to GL⊢φ→(UDT(φ)i∗T(→U)→Uj∗T), as desired.