Any temporal discounting other than temporal is provably inconsistent
The conditions of the proof are applicable only to reinforcement agents which, as a matter of architecture, are forced to integrate anticipated rewards using a fixed weighting function whose time axis is constantly reindexed to be relative to the present.
To recap, the idea is that it is the self-similarity property of exponential functions that produces this result—and the exponential function is the only non-linear function with that property.
All other forms of discounting allow for the possibility of preference reversals with the mere passage of time—as discussed here.
This idea has nothing to do with reinforcement learning.
To recap, the idea is that it is the self-similarity property of exponential functions that produces this result—and the exponential function is the only non-linear function with that property.
All other forms of discounting allow for the possibility of preference reversals with the mere passage of time—as discussed here.
This idea has nothing to do with reinforcement learning.