Dagon comments on Limiting an AGI’s Context Temporally

Dagon 20 Feb 2019 17:39 UTC
2 points
Hmm. Do we (the creators of the AI) think this is correct? That is, does it match OUR desires for the future?
It’s fine (like all the other artificial limits proposed to prevent harmful runaway optimization) for early-stage prototypes, but if it’s not actually backed by truth, it won’t last—we’re explicitly reducing the power of an agent, which will make it less effective at actually optimizing the right things.
Of course, maybe it _is_ true, that we prefer to optimize for the local and short-term, and put only a small amount of weight on far future states of the universe. That’s certainly my felt experience as an agent, but I don’t think it’s my reflective belief.
- EulersApprentice 20 Feb 2019 20:55 UTC
  4 points
  Parent
  I should clarify that the discounting is not a shackle, per se, but a specification of the utility function. It’s a normative specification that results now are better than results later according to a certain discount rate. An AI that cares about results now will not change itself to be more “patient” – because then it will not get results now, which is what it cares about.
  The key is that the utility function’s weights over time should form a self-similar graph. That is, if results in 10 seconds are twice as valuable as results in 20 seconds, then results in 10 minutes and 10 seconds need to be twice as valuable as results in 10 minutes and 20 seconds. If this is not true, the AI will indeed alter itself so its future self is consistent with its present self.
  - TheWakalix 21 Feb 2019 4:47 UTC
    1 point
    Parent
    Wait, but isn’t the exponential curve self-similar in that way, not the hyperbolic curve? I notice that I am confused. (Edit to clarify: I’m the only one who said hyperbolic, this is entirely my own confusion.)
    Justification: waiting $x$ seconds at time $a$ should result in the same discount ratio as waiting $x$ seconds at time $b$ . If $f (x)$ is the discounting function, this is equivalent to saying that $\frac{f (a + x)}{f (a)} = \frac{f (b + x)}{f (b)}$ . If we let $f (x) = e^{- x}$ , then this holds: $\frac{e^{- (a + x)}}{e^{- a}} = e^{- x} = \frac{e^{- (b + x)}}{e^{- b}}$ . But if $f (x) = \frac{1}{x}$ , then $\frac{a}{a + x} \neq \frac{b}{b + x}$ unless $a = b$ . (To see why, just cross-multiply.)
    It turns out that I noticed a real thing. “Although exponential discounting has been widely used in economics, a large body of evidence suggests that it does not explain people’s choices. People choose as if they discount future rewards at a greater rate when the delay occurs sooner in time.”
    Hyperbolic discounting is, in fact, irrational as you describe, in the sense that an otherwise rational agent will self-modify away from it. “People [...] seem to show inconsistencies in their choices over time.” (By the way, thanks for making the key mathematical idea of discounting clear.)
    (That last quote is also amusing: dry understatement.)
- TheWakalix 20 Feb 2019 18:05 UTC
  1 point
  Parent
  Hmm. Do we (the creators of the AI) think this is correct? That is, does it match OUR desires for the future?
  Code filters off desires, unless the AI has been programmed to Do What We Mean. “The genie knows but doesn’t care,” and so on.
  I think you’re making two distinct points. First, that a competent AGI that is nevertheless shackled with hyperbolic discounting will probably remove the discounting. Second, that a hyperbolic AI would not effectively match our own goals. I agree with the second, but that has no bearing on the first. My original comment was exclusively talking about the first claim.
  - Dagon 21 Feb 2019 0:26 UTC
    1 point
    Parent
    Hmm. Thanks for clearly separating those two points, and I agree that I was mixing them together. I suspect that they _are_ mixed together, because reality will eventually win out (if the AI isn’t optimizing the universe as well as possible, it’ll be replaced by one that does), but I don’t think I can make that argument clearly (because I get tangled up in corrigibility and control—who is able to make the decision to alter the utility function or replace the AI? I hope it’s not Moloch, but fear that it is.)