By the stated definitions, “v-avoidable event” is pretty much trivial when the event doesn’t lead to lasting utility loss. The conditions on “v-avoidable event” are basically:
The agent’s policy converges to optimality.
There’s a sublinear function D(t) where the agent avoids the event with probability 1 for D(t) time, in the limit as t goes to infinity.
By this definition, “getting hit in the face with a brick before round 3” is an avoidable event, even when the sequence of policies lead to the agent getting hit in the face with a brick on round 2 with certainty and it’s possible to dodge it. Let the sublinear function be the constant 1, and let the sequence of policies converge to “dodge” on round 1 and “stay” on round 2, and let the brick incur sublinear utility loss.
This fulfills the conditions, so getting hit in the face with a brick before timestep 3 is a “v-avoidable” event despite certainly occuring. Thus, this condition is only meaningful about lasting failures that incur enough utility loss to prevent convergence to the optimal policy.
By the stated definitions, “v-avoidable event” is pretty much trivial when the event doesn’t lead to lasting utility loss. The conditions on “v-avoidable event” are basically:
The agent’s policy converges to optimality.
There’s a sublinear function D(t) where the agent avoids the event with probability 1 for D(t) time, in the limit as t goes to infinity.
By this definition, “getting hit in the face with a brick before round 3” is an avoidable event, even when the sequence of policies lead to the agent getting hit in the face with a brick on round 2 with certainty and it’s possible to dodge it. Let the sublinear function be the constant 1, and let the sequence of policies converge to “dodge” on round 1 and “stay” on round 2, and let the brick incur sublinear utility loss.
This fulfills the conditions, so getting hit in the face with a brick before timestep 3 is a “v-avoidable” event despite certainly occuring. Thus, this condition is only meaningful about lasting failures that incur enough utility loss to prevent convergence to the optimal policy.