I had this confusion long ago as well. I think the definition is much clearer if you just say “When we press the button, we flip a coin that comes up heads 1/billion times. We only change the agent’s values / turn it off if the coin comes up tails, which almost always happens. The agent chooses a policy assuming that the coin comes up heads.”
I had this confusion long ago as well. I think the definition is much clearer if you just say “When we press the button, we flip a coin that comes up heads 1/billion times. We only change the agent’s values / turn it off if the coin comes up tails, which almost always happens. The agent chooses a policy assuming that the coin comes up heads.”