What do you mean by “agents have different time horizons”?
To answer my best guess of what you meant: this post used “most agents do X” as shorthand for “action X is optimal with respect to a large-measure set over reward functions”, but the analysis only considers the single-agent MDP setting, and how, for a fixed reward function or reward function distribution, optimal action for an agent tends to vary with the discount rate. There aren’t multiple formal agents acting in the same environment.
The single-agent MDP setting resolves my confusion; now it is just a curiosity with respect to directions future work might go. The action varies with discount rate result is essentially what interests me, so refocusing in the context of the single-agent case: what do you think of the discount rate being discontinuous?
So we are clear there isn’t an obvious motivation for this, so my guess for the answer is something like “Don’t know and didn’t check because it cannot change the underlying intuition.”
Discontinuous with respect to what? The discount rate just is, and there just is an optimal policy set for each reward function at a given discount rate, and so it doesn’t make sense to talk about discontinuity without having something to govern what it’s discontinuous with respect to. Like, teleportation would be positionally discontinuous with respect to time.
You can talk about other quantities being continuous with respect to change in the discount rate, however, and the paper proves prove the continuity of e.g. POWER and optimality probability with respect to γ∈[0,1].
What do you mean by “agents have different time horizons”?
To answer my best guess of what you meant: this post used “most agents do X” as shorthand for “action X is optimal with respect to a large-measure set over reward functions”, but the analysis only considers the single-agent MDP setting, and how, for a fixed reward function or reward function distribution, optimal action for an agent tends to vary with the discount rate. There aren’t multiple formal agents acting in the same environment.
The single-agent MDP setting resolves my confusion; now it is just a curiosity with respect to directions future work might go. The action varies with discount rate result is essentially what interests me, so refocusing in the context of the single-agent case: what do you think of the discount rate being discontinuous?
So we are clear there isn’t an obvious motivation for this, so my guess for the answer is something like “Don’t know and didn’t check because it cannot change the underlying intuition.”
Discontinuous with respect to what? The discount rate just is, and there just is an optimal policy set for each reward function at a given discount rate, and so it doesn’t make sense to talk about discontinuity without having something to govern what it’s discontinuous with respect to. Like, teleportation would be positionally discontinuous with respect to time.
You can talk about other quantities being continuous with respect to change in the discount rate, however, and the paper proves prove the continuity of e.g. POWER and optimality probability with respect to γ∈[0,1].