I should clarify that the discounting is not a shackle, per se, but a specification of the utility function. It’s a normative specification that results now are better than results later according to a certain discount rate. An AI that cares about results now will not change itself to be more “patient” – because then it will not get results now, which is what it cares about.
The key is that the utility function’s weights over time should form a self-similar graph. That is, if results in 10 seconds are twice as valuable as results in 20 seconds, then results in 10 minutes and 10 seconds need to be twice as valuable as results in 10 minutes and 20 seconds. If this is not true, the AI will indeed alter itself so its future self is consistent with its present self.
Wait, but isn’t the exponential curve self-similar in that way, not the hyperbolic curve? I notice that I am confused. (Edit to clarify: I’m the only one who said hyperbolic, this is entirely my own confusion.)
Justification: waiting x seconds at time a should result in the same discount ratio as waiting xseconds at time b. If f(x) is the discounting function, this is equivalent to saying that f(a+x)f(a)=f(b+x)f(b) . If we let f(x)=e−x, then this holds: e−(a+x)e−a=e−x=e−(b+x)e−b. But if f(x)=1x , then aa+x≠bb+x unless a=b. (To see why, just cross-multiply.)
It turns out that I noticed a real thing. “Although exponential discounting has been widely used in economics, a large body of evidence suggests that it does not explain people’s choices. People choose as if they discount future rewards at a greater rate when the delay occurs sooner in time.”
Hyperbolic discounting is, in fact, irrational as you describe, in the sense that an otherwise rational agent will self-modify away from it. “People [...] seem to show inconsistencies in their choices over time.” (By the way, thanks for making the key mathematical idea of discounting clear.)
(That last quote is also amusing: dry understatement.)
I should clarify that the discounting is not a shackle, per se, but a specification of the utility function. It’s a normative specification that results now are better than results later according to a certain discount rate. An AI that cares about results now will not change itself to be more “patient” – because then it will not get results now, which is what it cares about.
The key is that the utility function’s weights over time should form a self-similar graph. That is, if results in 10 seconds are twice as valuable as results in 20 seconds, then results in 10 minutes and 10 seconds need to be twice as valuable as results in 10 minutes and 20 seconds. If this is not true, the AI will indeed alter itself so its future self is consistent with its present self.
Wait, but isn’t the exponential curve self-similar in that way, not the hyperbolic curve? I notice that I am confused. (Edit to clarify: I’m the only one who said hyperbolic, this is entirely my own confusion.)
Justification: waiting x seconds at time a should result in the same discount ratio as waiting xseconds at time b. If f(x) is the discounting function, this is equivalent to saying that f(a+x)f(a)=f(b+x)f(b) . If we let f(x)=e−x, then this holds: e−(a+x)e−a=e−x=e−(b+x)e−b. But if f(x)=1x , then aa+x≠bb+x unless a=b. (To see why, just cross-multiply.)
It turns out that I noticed a real thing. “Although exponential discounting has been widely used in economics, a large body of evidence suggests that it does not explain people’s choices. People choose as if they discount future rewards at a greater rate when the delay occurs sooner in time.”
Hyperbolic discounting is, in fact, irrational as you describe, in the sense that an otherwise rational agent will self-modify away from it. “People [...] seem to show inconsistencies in their choices over time.” (By the way, thanks for making the key mathematical idea of discounting clear.)
(That last quote is also amusing: dry understatement.)