The final lemma in this sequence will be necessary for use in an upcoming post about logical inductors and correlated equilibria, so it will be proved here in order to not clutter up that post.
Let S1 be the set of actions available to player 1. |S1|=m. Given a matrix M, Mj refers to the j-th row vector of M. Given some vector →v, |→v| is defined in the usual way. In this post, aj is typically used to refer to some default action by player 1, and aj′ typically refers to some better action.
Consider the space ∏ijR. This would be a space of probabilities/prices assigned to the various joint outcomes, by some logical inductor. It’s possible to use expressible features to pick out fuzzy subsets of this space, by taking the prices of the various sentences as input and mapping them to [0,1] (in practice, the prices will always be in a hypercube, but it’ll be convenient to not have to deal with the special case where the prices are on an edge of the hypercube)
The infinite sequence {pt}t∈N+ of points that lie in this space can be identified with the sequence of prices of the logical inductor on day t.
Consider some fuzzy set/P-generable region in that space, bounded in [0,1], which we’ll denote by C.
Let αC(k) be the number in [0,1] that the expressible feature α which implements C outputs when given the market prices of the day k. By a slight abuse of notation, αC(p) refers to what the expressible feature would output if the market prices were given by the point p.
C∈Baj iff there’s a point p∈supp(C) such that p∈Baj. In words, there’s a point in C where aj is an evidential best response.
Given some point p, where some action aj has nonzero probability (for a well-defined expected utility), then an action aj′dominates aj by δ if EUlower(aj′)=EU(aj)+δ.
#Lemma 1:
When a point p has |Pj|≥ϵ, all points q in an L1-ball of size δ around p where δ<<ϵ will have |EUq(aj)−EUp(aj)|≤2δ. (approximately)
Proof:
EUp(aj)=Uj⋅Pj|Pj|=Uj⋅Pjϵ≃Uj⋅Pj±δϵ±δ=EUq(aj)
(here, ±δ should be interpreted to be any number in [−δ,δ], and it doesn’t necessarily have to be the same number on the top and bottom. Also, in the next line, X will be used as an abbreviation for Uj⋅Pj, which is a number in [0,ϵ])
Now let maxn(aj) be the n’th largest utility in the j-th row of the utility matrix.max2(aj) refers to the highest utility less than the maximum utility, in the event that there are two actions with maximal utility.minn is defined similarly.
#Lemma 2:
For all points p, there exists a change in probabilities by ϵ (according to the L1-norm), for which the expected utility of the move aj can be increased by (max1(aj)−max2(aj))ϵ2, or pushed to the highest attainable utility, whichever one is lower.
The same argument will also work to establish that the probabilities can be lowered by (min2(aj)−max2(aj))ϵ2, or pushed to the lowest attainable utility, whichever one is higher.
Assume that the probability matrix P has Pj=→0 Then there is an arbitrarily small change in probability that will have aj have the highest possible expected utility, and it is proved.
Assume that the probability matrix P has Pj have less than ϵ2 probability, total, on the i which correspond to non-maximal utilities in Uj. Then there is an ϵ-change in probabilities (take all the probability mass from non-optimal outcomes and put it on the best outcome) that makes aj have the highest possible expected utility, and it is proved.
Assume that the probability matrix P has Pj have ϵ2 or greater probability, total, on the i which correspond to non-maximal utilities in Uj. Then by taking ϵ2 probability mass from the non-optimal outcomes, and putting it on the optimal outcome, it is easy to verify from the expected utility equation that it will produce at least a (max1(aj)−max2(aj))ϵ2 increase in the expected utility.
#Lemma 3:
If there is a point p for which no action aj′ dominates aj by a positive number, and there is a P-generable fuzzy set C∉Baj, then p must be on the boundary of supp(C), or on the exterior of supp(C).
Assume the opposite, that this point is on the interior of C. Due to the continuity of the fuzzy set, and the fact that aj is not dominated by any positive number, then there is an arbitrarily small perturbation of the prices to make aj an evidential best response. Now there is a point where aj is an evidential best response in the support of C. However, we assumed that the support of C had no points where aj was an evidential best response. Therefore, there is a contradiction, and p must either be on the boundary of supp(C), or the exterior of supp(C).
In both cases, due to continuity, this point must have αc(p)=0.
#Lemma 4:
Given a P-generable fuzzy set C such that C∉Maj, and some ϵ, then let . Then all points in C′ have some action aj′ that dominates aj by δ, for some positive constant delta.
To begin with, the function αC has some bounded lipschitz constant L. Let ϵ′:=ϵ2L.
Given a point p∈C′, associate each action with EUlower,p for that action, and select an arbitrary highest-value action to be aj′.
Consider the function F(p) that uses the highest-value action for p as aj′, and then outputs
min((max1(aj)−max2(aj))ϵ′2,(min2(aj′)−min1(aj′))ϵ′2)
when both of these terms are positive, and the term that is positive when the other term is 0. (we can ignore the case where both terms are 0, as we shall soon see)
To begin, assume that both terms are 0. That means that EU(aj′) and EU(aj) are both constants. If EU(aj′)>EU(aj), then the theorem is proved. If they are equal, then one of two situations apply. In case 1, aj′ is never selected as one of the best actions, in which case we can ignore it. In case 2, aj′ is selected as one of the best actions somewhere in C′, and then no action would dominate aj by a positive number in the interior of C, which is impossible by Lemma 3.
Therefore, assuming the theorem hasn’t been trivially proved yet, F(p) is always defined for all p∈C′. Also, note that F(p) is bounded below by a constant (because there are only finitely many actions, all of which have their corresponding term be a constant, or 0)
Select an arbitrary point p, and the corresponding action, aj′ such that
EUlower,p(aj′)−EUupper,p(aj)<F(p)
If there isn’t any, we’re done (and δ=argminpF(p).
Otherwise, by Lemma 2, if we look at the L1 ball of ϵ′-sized perturbations, one of two events will have occurred. In case 1, there’s a point in that ball where the “domination gap” given by F(p) has been closed completely. In case 2, the maximum possible utility for aj is less than the minimum possible utility for aj′, and the ball of size ϵ′ has been ruled out, as all points in that ball have aj′ dominating aj by a fixed constant δ.
In case 1, because the point q where the utility gap closes is ϵ′-away from a point p where αC(p)>ϵ, and ϵ′=ϵ2L, then αC(q)≥ϵ2 (due to the bounded lipschitz constant of αC) Then there’s a point where aj is not dominated on the interior of C, which by Lemma 3, is impossible.
Therefore, case 2 must apply. However, because the worst-case utility of action aj′ dominates the best-case utility of action aj by δ, our assumption that the theorem was false is false.
#Lemma 5:
Given some point p where aj′ dominates aj by δ or more, and |Pj|>ϵ>>δ, and p is more than 2(m−1)ϵ+δ3 from a point that is in Baj (according to the L1 norm), then the sequence of prices {pt}t∈N+ will only finitely often be in the L1-ball of size ~δ6 around p.
We will consider two cases, and prove them seperately.
In case 1, |Pj′|≥ϵ.
By Lemma 1, the change in expected utility for aj and aj′ over the δ6L1 ball will be about, or less than, δ3. Because aj′ dominates aj by δ or more at p, then in this ball, aj′ always dominates aj by δ3 or more.
Now, there is a convex P-generable fuzzy set C which covers this ball (and a tiny region outside of it, due to the continuous indicator functions, but this doesn’t affect anything, because the support of that region still has aj′ dominating aj by about δ3 or more). Assuming that there are infinitely many points in the ball (by contradiction), then by calibration, the empirical joint probabilities, summed over all time, will be within th support of C.
To recap the definition of expected value for a logical inductor,
By assumption, at=j has a positive probability around ϵ or greater. By coherence, the logical inductor will eventually learn that the price on at=j should equal the sum of the prices over all the other joint outcomes, so in the limit, Pt(at=j)≃t|Pj|. A similar argument applies to learning the utilities of the various outcomes in the top of the fraction, so in the limit, Et(Ut|at=j)≃tEU(aj).
Now, since aj′ also has probability bounded away from 0, then the same argument applies to it as well. The expected utilities will eventually be learned. Because there’s a δ3 or larger difference in expected utilities, aj will be taken with limiting probability 0 when the prices are inside C. However, due to calibration, the probability of aj must be bounded away from 0, so we have a contradiction, and the sequences of prices {pt}t∈N+ must have only finitely many elements in the L1-ball.
Now for case 2. Assume case 1 is false, so for the point p, for all aj′ which dominate aj by δ or more, |aj′|<ϵ.
Consider the L1-ball of size δ6 around p. There degree of improvement in EU(aj) over the whole ball can be at most δ3, by Lemma 1. Let δ′ be the maximum increase over EUp(aj) for aj, for the whole ball.
For each action aj′ that is not aj and has EUlower,p(aj′)>EUupper,p(aj), there is a perturbation of size 2ϵ or less that will either drive their expected utility to EUupper,p(aj)+δ′ or below, or drive their expected utility to the minimum possible.
If aj′ has a probability less than ϵ (as all the aj′ that δ-dominate aj do), by taking ϵ utility mass from that action (to drive it to probability 0) and putting it on aj in a way that preserves expected value, that aj′ will be counted as having the lowest possible expected utility in the dominance calculation. If aj′ has a probability ≥ϵ (and has an expected utility within δ of aj) Lemma 2 works to either drive the expected utility of aj′ below that of aj, or drive it to the minimum possible expected utility.
If the worst-case utility for all those actions is equal to or less than EUp(aj)+δ′, then there’s a point in Baj that’s within 2(m−1)ϵ+δ3 distance of p (apply the perturbations mentioned above to make all other actions lose), which violates one of the starting assumptions. Therefore, there must be some action aj′ that dominates aj by more than δ′ at p, using the worst-case utility, so it dominates aj by a fixed amount over the whole L1 ball.
Now, we shouldn’t necessarily assume that, over the whole ball, accurate utilities for aj′ will be learned, because the ball may have regions where the probability of aj′ is arbitrarily close to 0. However, density-zero exploration lets the inductor learn a lower bound on the utility of aj′, as we’ll see.
Assume that the L1-ball has infinitely many price points in it (for a contradiction) Then, we can again make a P-generable convex set C with infinitely many points in it, and because of infinite exploration on divergent P-generable subsequences (this is a slight strengthening of Scott’s density-zero exploration), the action aj′ will be taken infinitely often, and in finite time, the inductor will learn that Et(Ut|at=j) will be close to, or greater than, the worst-case utility for aj′ when the prices are in the set C.
Again, we get the logical inductor learning to never take action aj, which produces a contradiction with the calibration property in the same way as before, because in this region, the price of aj is bounded away from 0.
Therefore, in case 2, there are still only finitely many points in the ball around p, and the theorem has been proved.
#Lemma 6:
One of the following two properties applies to all convex P-generable sets of prices over joint outcomes:
1: C∈Baj
2: limt→∞1t∑k≤tαC(k)⋅ind(aj,k)=0
(I suspect it’s possible to strengthen condition 2 a little bit, but this more than suffices for the upcoming proof in the next post. Also, ind(aj,k) is the function that is 0 when aj is not taken on turn k, and 1 otherwise.)
Assume 1 is false. If it’s true, the proof is over. Also, assume there are infinitely many points in the interior of C, because otherwise 2 is true and the proof is over. The space of prices is composed entirely of the following three disjoint sets of points. Set Γ is composed of points where αC(p)<ϵ. in sets Δ and Θ, this doesn’t hold (and by Lemma 4, there must always be some action aj′ that dominates aj by δ or more).
Set Δ∪Θ is convex, because C is. Set Δ is the (convex) set of points where aj<ϵL, and set Θ is the (convex) set of points where aj≥ϵL.
Set Θ can have Lemma 5 applied to it to conclude that the sequence of prices will only be within it finitely often. Just identify ϵL with 2(m−1)ϵ+δ3 in Lemma 5 to get the bounding away from Baj, because that’s the minimum distance to travel to get outside of supp(C), and identify δ or something sufficiently smaller with the δ in Lemma 5 to get the domination condition.
We can split up
limt→∞1t∑k≤tαC(k)⋅ind(aj,k)
into three sequences (with some overlap, but it won’t matter, because it produces an overestimate)
In the first sequence, the prices for the day are in set Γ. In the second (fuzzy) sequence,the expected utility of aj is less than ϵL (plus a little bit, so an expressible feature can capture this), and in particular, this fuzzy sequence is P-generable. In the third sequence, the prices for the day are in set Θ. All days are in one of these three sequences.
Over the first sequence, the time-averaged sum must limit to ϵ or less, because αC(k)<ϵ, always.
Over the second (fuzzy) sequence, the long-term frequency of aj being taken must limit to ϵL or less, by calibration, so the time-averaged sum must limit to ϵL or less.
Over the third sequence, because it has finitely many points in it, the time-averaged sum limits to 0.
Therefore, the time-averaged sum must limit to ϵ(1+1L) or less. Because L is a fixed constant, and ϵ can be as small as we want, the time-averaged sum must converge to 0.
Logical Inductor Lemmas
The final lemma in this sequence will be necessary for use in an upcoming post about logical inductors and correlated equilibria, so it will be proved here in order to not clutter up that post.
Much of the notation in this is reused from the previous post, Two Notions of Best Response
Let S1 be the set of actions available to player 1. |S1|=m. Given a matrix M, Mj refers to the j-th row vector of M. Given some vector →v, |→v| is defined in the usual way. In this post, aj is typically used to refer to some default action by player 1, and aj′ typically refers to some better action.
Consider the space ∏ijR. This would be a space of probabilities/prices assigned to the various joint outcomes, by some logical inductor. It’s possible to use expressible features to pick out fuzzy subsets of this space, by taking the prices of the various sentences as input and mapping them to [0,1] (in practice, the prices will always be in a hypercube, but it’ll be convenient to not have to deal with the special case where the prices are on an edge of the hypercube)
The infinite sequence {pt}t∈N+ of points that lie in this space can be identified with the sequence of prices of the logical inductor on day t.
Consider some fuzzy set/P-generable region in that space, bounded in [0,1], which we’ll denote by C.
Let αC(k) be the number in [0,1] that the expressible feature α which implements C outputs when given the market prices of the day k. By a slight abuse of notation, αC(p) refers to what the expressible feature would output if the market prices were given by the point p.
C∈Baj iff there’s a point p∈supp(C) such that p∈Baj. In words, there’s a point in C where aj is an evidential best response.
Given some point p, where some action aj has nonzero probability (for a well-defined expected utility), then an action aj′ dominates aj by δ if EUlower(aj′)=EU(aj)+δ.
#Lemma 1:
When a point p has |Pj|≥ϵ, all points q in an L1-ball of size δ around p where δ<<ϵ will have |EUq(aj)−EUp(aj)|≤2δ. (approximately)
Proof:
EUp(aj)=Uj⋅Pj|Pj|=Uj⋅Pjϵ≃Uj⋅Pj±δϵ±δ=EUq(aj)
(here, ±δ should be interpreted to be any number in [−δ,δ], and it doesn’t necessarily have to be the same number on the top and bottom. Also, in the next line, X will be used as an abbreviation for Uj⋅Pj, which is a number in [0,ϵ])
Uj⋅Pj±δϵ±δ≃(X±δ)(ϵ±δ)ϵ2−δ2≃ϵX±δX±ϵδϵ2=X±δXϵ±δϵ≃EUp(aj)±2δ
QED.
Now let maxn(aj) be the n’th largest utility in the j-th row of the utility matrix.max2(aj) refers to the highest utility less than the maximum utility, in the event that there are two actions with maximal utility.minn is defined similarly.
#Lemma 2:
For all points p, there exists a change in probabilities by ϵ (according to the L1-norm), for which the expected utility of the move aj can be increased by (max1(aj)−max2(aj))ϵ2, or pushed to the highest attainable utility, whichever one is lower.
The same argument will also work to establish that the probabilities can be lowered by (min2(aj)−max2(aj))ϵ2, or pushed to the lowest attainable utility, whichever one is higher.
Assume that the probability matrix P has Pj=→0 Then there is an arbitrarily small change in probability that will have aj have the highest possible expected utility, and it is proved.
Assume that the probability matrix P has Pj have less than ϵ2 probability, total, on the i which correspond to non-maximal utilities in Uj. Then there is an ϵ-change in probabilities (take all the probability mass from non-optimal outcomes and put it on the best outcome) that makes aj have the highest possible expected utility, and it is proved.
Assume that the probability matrix P has Pj have ϵ2 or greater probability, total, on the i which correspond to non-maximal utilities in Uj. Then by taking ϵ2 probability mass from the non-optimal outcomes, and putting it on the optimal outcome, it is easy to verify from the expected utility equation that it will produce at least a (max1(aj)−max2(aj))ϵ2 increase in the expected utility.
#Lemma 3:
If there is a point p for which no action aj′ dominates aj by a positive number, and there is a P-generable fuzzy set C∉Baj, then p must be on the boundary of supp(C), or on the exterior of supp(C).
Assume the opposite, that this point is on the interior of C. Due to the continuity of the fuzzy set, and the fact that aj is not dominated by any positive number, then there is an arbitrarily small perturbation of the prices to make aj an evidential best response. Now there is a point where aj is an evidential best response in the support of C. However, we assumed that the support of C had no points where aj was an evidential best response. Therefore, there is a contradiction, and p must either be on the boundary of supp(C), or the exterior of supp(C).
In both cases, due to continuity, this point must have αc(p)=0.
#Lemma 4:
Given a P-generable fuzzy set C such that C∉Maj, and some ϵ, then let . Then all points in C′ have some action aj′ that dominates aj by δ, for some positive constant delta.
To begin with, the function αC has some bounded lipschitz constant L. Let ϵ′:=ϵ2L.
Given a point p∈C′, associate each action with EUlower,p for that action, and select an arbitrary highest-value action to be aj′.
Consider the function F(p) that uses the highest-value action for p as aj′, and then outputs min((max1(aj)−max2(aj))ϵ′2,(min2(aj′)−min1(aj′))ϵ′2) when both of these terms are positive, and the term that is positive when the other term is 0. (we can ignore the case where both terms are 0, as we shall soon see)
To begin, assume that both terms are 0. That means that EU(aj′) and EU(aj) are both constants. If EU(aj′)>EU(aj), then the theorem is proved. If they are equal, then one of two situations apply. In case 1, aj′ is never selected as one of the best actions, in which case we can ignore it. In case 2, aj′ is selected as one of the best actions somewhere in C′, and then no action would dominate aj by a positive number in the interior of C, which is impossible by Lemma 3.
Therefore, assuming the theorem hasn’t been trivially proved yet, F(p) is always defined for all p∈C′. Also, note that F(p) is bounded below by a constant (because there are only finitely many actions, all of which have their corresponding term be a constant, or 0)
Select an arbitrary point p, and the corresponding action, aj′ such that
EUlower,p(aj′)−EUupper,p(aj)<F(p)
If there isn’t any, we’re done (and δ=argminpF(p).
Otherwise, by Lemma 2, if we look at the L1 ball of ϵ′-sized perturbations, one of two events will have occurred. In case 1, there’s a point in that ball where the “domination gap” given by F(p) has been closed completely. In case 2, the maximum possible utility for aj is less than the minimum possible utility for aj′, and the ball of size ϵ′ has been ruled out, as all points in that ball have aj′ dominating aj by a fixed constant δ.
In case 1, because the point q where the utility gap closes is ϵ′-away from a point p where αC(p)>ϵ, and ϵ′=ϵ2L, then αC(q)≥ϵ2 (due to the bounded lipschitz constant of αC) Then there’s a point where aj is not dominated on the interior of C, which by Lemma 3, is impossible.
Therefore, case 2 must apply. However, because the worst-case utility of action aj′ dominates the best-case utility of action aj by δ, our assumption that the theorem was false is false.
#Lemma 5:
Given some point p where aj′ dominates aj by δ or more, and |Pj|>ϵ>>δ, and p is more than 2(m−1)ϵ+δ3 from a point that is in Baj (according to the L1 norm), then the sequence of prices {pt}t∈N+ will only finitely often be in the L1-ball of size ~δ6 around p.
We will consider two cases, and prove them seperately.
In case 1, |Pj′|≥ϵ.
By Lemma 1, the change in expected utility for aj and aj′ over the δ6 L1 ball will be about, or less than, δ3. Because aj′ dominates aj by δ or more at p, then in this ball, aj′ always dominates aj by δ3 or more.
Now, there is a convex P-generable fuzzy set C which covers this ball (and a tiny region outside of it, due to the continuous indicator functions, but this doesn’t affect anything, because the support of that region still has aj′ dominating aj by about δ3 or more). Assuming that there are infinitely many points in the ball (by contradiction), then by calibration, the empirical joint probabilities, summed over all time, will be within th support of C.
To recap the definition of expected value for a logical inductor,
Et(Ut|at=j)=1ti=t−1∑i=0min⎛⎝Pt("at=j"∧"Ut>it")Pt(at=j),1⎞⎠
By assumption, at=j has a positive probability around ϵ or greater. By coherence, the logical inductor will eventually learn that the price on at=j should equal the sum of the prices over all the other joint outcomes, so in the limit, Pt(at=j)≃t|Pj|. A similar argument applies to learning the utilities of the various outcomes in the top of the fraction, so in the limit, Et(Ut|at=j)≃tEU(aj).
Now, since aj′ also has probability bounded away from 0, then the same argument applies to it as well. The expected utilities will eventually be learned. Because there’s a δ3 or larger difference in expected utilities, aj will be taken with limiting probability 0 when the prices are inside C. However, due to calibration, the probability of aj must be bounded away from 0, so we have a contradiction, and the sequences of prices {pt}t∈N+ must have only finitely many elements in the L1-ball.
Now for case 2. Assume case 1 is false, so for the point p, for all aj′ which dominate aj by δ or more, |aj′|<ϵ.
Consider the L1-ball of size δ6 around p. There degree of improvement in EU(aj) over the whole ball can be at most δ3, by Lemma 1. Let δ′ be the maximum increase over EUp(aj) for aj, for the whole ball.
For each action aj′ that is not aj and has EUlower,p(aj′)>EUupper,p(aj), there is a perturbation of size 2ϵ or less that will either drive their expected utility to EUupper,p(aj)+δ′ or below, or drive their expected utility to the minimum possible.
If aj′ has a probability less than ϵ (as all the aj′ that δ-dominate aj do), by taking ϵ utility mass from that action (to drive it to probability 0) and putting it on aj in a way that preserves expected value, that aj′ will be counted as having the lowest possible expected utility in the dominance calculation. If aj′ has a probability ≥ϵ (and has an expected utility within δ of aj) Lemma 2 works to either drive the expected utility of aj′ below that of aj, or drive it to the minimum possible expected utility.
If the worst-case utility for all those actions is equal to or less than EUp(aj)+δ′, then there’s a point in Baj that’s within 2(m−1)ϵ+δ3 distance of p (apply the perturbations mentioned above to make all other actions lose), which violates one of the starting assumptions. Therefore, there must be some action aj′ that dominates aj by more than δ′ at p, using the worst-case utility, so it dominates aj by a fixed amount over the whole L1 ball.
Now, we shouldn’t necessarily assume that, over the whole ball, accurate utilities for aj′ will be learned, because the ball may have regions where the probability of aj′ is arbitrarily close to 0. However, density-zero exploration lets the inductor learn a lower bound on the utility of aj′, as we’ll see.
Assume that the L1-ball has infinitely many price points in it (for a contradiction) Then, we can again make a P-generable convex set C with infinitely many points in it, and because of infinite exploration on divergent P-generable subsequences (this is a slight strengthening of Scott’s density-zero exploration), the action aj′ will be taken infinitely often, and in finite time, the inductor will learn that Et(Ut|at=j) will be close to, or greater than, the worst-case utility for aj′ when the prices are in the set C.
Again, we get the logical inductor learning to never take action aj, which produces a contradiction with the calibration property in the same way as before, because in this region, the price of aj is bounded away from 0.
Therefore, in case 2, there are still only finitely many points in the ball around p, and the theorem has been proved.
#Lemma 6:
One of the following two properties applies to all convex P-generable sets of prices over joint outcomes:
1: C∈Baj
2: limt→∞1t∑k≤tαC(k)⋅ind(aj,k)=0
(I suspect it’s possible to strengthen condition 2 a little bit, but this more than suffices for the upcoming proof in the next post. Also, ind(aj,k) is the function that is 0 when aj is not taken on turn k, and 1 otherwise.)
Assume 1 is false. If it’s true, the proof is over. Also, assume there are infinitely many points in the interior of C, because otherwise 2 is true and the proof is over. The space of prices is composed entirely of the following three disjoint sets of points. Set Γ is composed of points where αC(p)<ϵ. in sets Δ and Θ, this doesn’t hold (and by Lemma 4, there must always be some action aj′ that dominates aj by δ or more).
Set Δ∪Θ is convex, because C is. Set Δ is the (convex) set of points where aj<ϵL, and set Θ is the (convex) set of points where aj≥ϵL.
Set Θ can have Lemma 5 applied to it to conclude that the sequence of prices will only be within it finitely often. Just identify ϵL with 2(m−1)ϵ+δ3 in Lemma 5 to get the bounding away from Baj, because that’s the minimum distance to travel to get outside of supp(C), and identify δ or something sufficiently smaller with the δ in Lemma 5 to get the domination condition.
We can split up limt→∞1t∑k≤tαC(k)⋅ind(aj,k) into three sequences (with some overlap, but it won’t matter, because it produces an overestimate)
In the first sequence, the prices for the day are in set Γ. In the second (fuzzy) sequence,the expected utility of aj is less than ϵL (plus a little bit, so an expressible feature can capture this), and in particular, this fuzzy sequence is P-generable. In the third sequence, the prices for the day are in set Θ. All days are in one of these three sequences.
Over the first sequence, the time-averaged sum must limit to ϵ or less, because αC(k)<ϵ, always.
Over the second (fuzzy) sequence, the long-term frequency of aj being taken must limit to ϵL or less, by calibration, so the time-averaged sum must limit to ϵL or less.
Over the third sequence, because it has finitely many points in it, the time-averaged sum limits to 0.
Therefore, the time-averaged sum must limit to ϵ(1+1L) or less. Because L is a fixed constant, and ϵ can be as small as we want, the time-averaged sum must converge to 0.