The minimax decision rule has the pathology that, when events are sufficiently “optimistic”, behavior can become highly suboptimal. This is analogous to off-policy irrational behavior in Nash equilibria of games in extensive form. In order to remedy the problem, we introduce a refinement called “subagent perfect minimax,” somewhat analogous to subgame perfect equilibria and other related solution concepts. It is possible to prove existence and, when the model factorizes, dynamic consistency.
The proofs are omitted, but we can easily provide them if necessary.
##Motivation
Consider the following class of environments: during the first step, the agent gains 1$ or action b which leads to gaining $$0$.
The minimax payoff of this class is 0$ in the first step, choose b if you gained 1,whichisthesameastheworst−case(p=0$) payoff of π∗.
Note that the minimax environment has p=0 and π only differs from π∗ on histories which are impossible in that environment. Moreover, if we considered the class p∈[ϵ,1] for some ϵ>0, the minimax payoff would be $$(1+\epsilon)$, and \(\pi^*\) would be the sole minimax policy. Therefore, in order to eliminate the pathology, we need to formulate a stability condition that ensures any admissible history is treated as occurring with at least infinitesimal probability.
##Results
Now consider the forecasting setting, with action set A, observation set O, time discount function γ:N→R≥0 and reward function r:(A×O)∗→R. As before, γ and r define the utility function u:(O∗→A)×Oω→R.
Consider Φ∈PC(Oω) and denote
O+Φ:={x∈O+∣∃μ∈Φ:μ(xOω)>0}
#Definition 1
Consider X⊆O+Φ finite.π∗∈P(O∗→A) is called an X-stable minimax policy for Φ when there are sequences {ϵ(n):X→(0,1)}n∈N and {π(n)∈P(O∗→A)}n∈N s.t. ϵ(n)→0, π(n)→π∗ and π(n) is a minimax policy for
Φ(n):={μ∈Φ∣∀x∈O∗,o∈O:xo∈X⟹μ(xoOω)≥ϵ(n)(xo)μ(xOω)}
In particular, the definition assumes Φ(n) is non-empty.
#Proposition 1
For any X⊆O+Φ finite, there exists π∗∈P(O∗→A) which is an X-stable minimax policy for Φ.
#Proposition 2
For any X⊆O+Φ finite and π∗∈P(O∗→A) an X-stable minimax policy for Φ, π∗ is in particular a minimax policy for Φ.
#Proposition 3
Consider X⊆O+Φ finite, π∗∈P(O∗→A) an X-stable minimax policy for Φ and x∈O∗ s.t. ∀y⊑x:y∈X∪{λO}. Assume Φ factorizes at x into Φ1,Φ2∈PC(Oω). Denote pr1:(O∗→A)→(¯xO∗→A) and pr2:(O∗→A)→(xO∗→A) the restriction mappings. Define π∗1:=pr1∗π and define π∗2:A|x|→(xO∗→A) by
Note that, thanks to X-stability, Proposition 3 doesn’t require the condition minμ∈Φ1μ(xOω)>0, as opposed to the case for arbitrary minimax policies (see Proposition 1 here).
#Definition 2
π∗∈P(O∗→A) is called a subagent perfect minimax policy for Φ when there is a sequence {X(n)⊆O+Φ}n∈N s.t. each X(n) is finite, X(n)⊆X(n+1) and ⋃n∈NX(n)=O+Φ, and a sequence π(n)∈P(O∗→A) s.t.π(n)→π and π(n) is an X(n)-stable minimax policy for Φ.
#Proposition 4
There exists π∗∈P(O∗→A) which is a subagent perfect minimax policy for Φ.
#Proposition 5
For π∗∈P(O∗→A) a subagent perfect minimax policy for Φ, π∗ is in particular a minimax policy for Φ.
#Proposition 6
Consider π∗∈P(O∗→A) a subagent perfect minimax policy for Φ and x∈O+Φ. Assume Φ factorizes at x into Φ1,Φ2∈PC(Oω).
Then
Subagent perfect minimax
% operators that are separated from the operand by a space
% operators that require brackets
% operators that require parentheses
% Paper specific
This post continues the study of minimax forecasting.
The minimax decision rule has the pathology that, when events are sufficiently “optimistic”, behavior can become highly suboptimal. This is analogous to off-policy irrational behavior in Nash equilibria of games in extensive form. In order to remedy the problem, we introduce a refinement called “subagent perfect minimax,” somewhat analogous to subgame perfect equilibria and other related solution concepts. It is possible to prove existence and, when the model factorizes, dynamic consistency.
The proofs are omitted, but we can easily provide them if necessary.
##Motivation
Consider the following class of environments: during the first step, the agent gains 1$ or action b which leads to gaining $$0$.
The minimax payoff of this class is 0$ in the first step, choose b if you gained 1,whichisthesameastheworst−case(p=0$) payoff of π∗.
Note that the minimax environment has p=0 and π only differs from π∗ on histories which are impossible in that environment. Moreover, if we considered the class p∈[ϵ,1] for some ϵ>0, the minimax payoff would be $$(1+\epsilon)$, and \(\pi^*\) would be the sole minimax policy. Therefore, in order to eliminate the pathology, we need to formulate a stability condition that ensures any admissible history is treated as occurring with at least infinitesimal probability.
##Results
Now consider the forecasting setting, with action set A, observation set O, time discount function γ:N→R≥0 and reward function r:(A×O)∗→R. As before, γ and r define the utility function u:(O∗→A)×Oω→R.
Consider Φ∈PC(Oω) and denote
O+Φ:={x∈O+∣ ∃μ∈Φ:μ(xOω)>0}
#Definition 1
Consider X⊆O+Φ finite.π∗∈P(O∗→A) is called an X-stable minimax policy for Φ when there are sequences {ϵ(n):X→(0,1)}n∈N and {π(n)∈P(O∗→A)}n∈N s.t. ϵ(n)→0, π(n)→π∗ and π(n) is a minimax policy for
Φ(n):={μ∈Φ∣∀x∈O∗,o∈O:xo∈X⟹μ(xoOω)≥ϵ(n)(xo)μ(xOω)}
In particular, the definition assumes Φ(n) is non-empty.
#Proposition 1
For any X⊆O+Φ finite, there exists π∗∈P(O∗→A) which is an X-stable minimax policy for Φ.
#Proposition 2
For any X⊆O+Φ finite and π∗∈P(O∗→A) an X-stable minimax policy for Φ, π∗ is in particular a minimax policy for Φ.
#Proposition 3
Consider X⊆O+Φ finite, π∗∈P(O∗→A) an X-stable minimax policy for Φ and x∈O∗ s.t. ∀y⊑x:y∈X∪{λO}. Assume Φ factorizes at x into Φ1,Φ2∈PC(Oω). Denote pr1:(O∗→A)→(¯xO∗→A) and pr2:(O∗→A)→(xO∗→A) the restriction mappings. Define π∗1:=pr1∗π and define π∗2:A|x|→(xO∗→A) by
π∗2(t):=pr2∗(π∗∣{s:O∗→A∣ti=s(x<i)})
Then
π∗2∈argmaxπ2:A|x|→(xO∗→A)minμ∈Φ2E(π∗1⊗xπ2)×μ[u>|x|]
Note that, thanks to X-stability, Proposition 3 doesn’t require the condition minμ∈Φ1μ(xOω)>0, as opposed to the case for arbitrary minimax policies (see Proposition 1 here).
#Definition 2
π∗∈P(O∗→A) is called a subagent perfect minimax policy for Φ when there is a sequence {X(n)⊆O+Φ}n∈N s.t. each X(n) is finite, X(n)⊆X(n+1) and ⋃n∈NX(n)=O+Φ, and a sequence π(n)∈P(O∗→A) s.t.π(n)→π and π(n) is an X(n)-stable minimax policy for Φ.
#Proposition 4
There exists π∗∈P(O∗→A) which is a subagent perfect minimax policy for Φ.
#Proposition 5
For π∗∈P(O∗→A) a subagent perfect minimax policy for Φ, π∗ is in particular a minimax policy for Φ.
#Proposition 6
Consider π∗∈P(O∗→A) a subagent perfect minimax policy for Φ and x∈O+Φ. Assume Φ factorizes at x into Φ1,Φ2∈PC(Oω). Then
π∗2∈argmaxπ2:A|x|→(xO∗→A)minμ∈Φ2E(π∗1⊗xπ2)×μ[u>|x|]