Continually-adjusted discounted preferences
A putative new idea for AI control; index here.
This is one of the more minor suggestions, just a small tweak to help solve a specific issue.
Discounting time
The issue is the strange behaviour that agents with discount rates have in respect to time.
Quickly, what probability would you put on time travel being possible?
I hope, as good Bayesians, you didn’t answer 0% (those who did should look here). Let’s assume, for argument sake, that you answered 0.1%.
Now assume that you have a discount rate of 10% per year (many putative agent designs use discount rates for convergence or to ensure short time-horizons, where they can have discount rates of 90% per second or even more extreme). By the end of 70 years, the utility will be discounted to roughly 0.1%. Thus, from then on (plus or minus a few years), the highest expected value action for you is to search for ways of travelling back in time, and do all you stuff then.
This is perfectly time-consistent: given these premisses, you’d want the “you in a century” to search frantically for a time-machine, as the actual expected utility increase they could achieve is tiny.
If you were incautions enough to have discount rates that go back into the past as well as the future, then you’d already be searching frantically for a time-machine, for the tiniest change of going back to the big bang and having an impact there...
Continual corrigibility
We want the agents we design to apply the discount rate looking to the future only, not towards the past. To do so, we can apply corrigibility (see also here). This allows us to change an agent’s utility function, reward it (in utility) for any cost involve with the change.
The logical thing to do is to corrige the agent’s utility function to something that doesn’t have such an extreme value in the past. At the moment of applying corrigibility, cut off the agent’s utility at the present moment, and replace the past values with something much smaller. You could just set it to zero (though as a mathematician my first instincts was to make it slope symmetrically down towards the past as it does towards the future—making the present the most important time ever!).
This correction could be applied regularly, maybe even continuously, removing the incentive to search desperately for ways to try and affect the past.
Note that this is not a perfect cure—an AI could create subagents that will research time-travel and come back to the present day to increase its current (though not future) utility, bringing extra resources. A way of reducing risk that could be to have a “maximal utility” (a bound on how high utility can go at any given point) that sharply reduces the possible impact of time-travelling subagents. This bound could be lifted going into the future, to allow the AI more freedom to increase it.
A more specific approach to dealing with subagents will be presented soon.
A more general method?
This is just a use of corrigibility to solve a specific problem, but it’s very possible that there are other problems that corrigibility could be similarly successfully applied to. Anything where the form of the utility function made sense at one point, but became a drag at a later date.
Is this actually wrong, though? If backwards time travel were possible, it would be really valuable, and you’d want to go back to the earliest point where you could reasonably expect to survive.
It seems to me that the reason to not frantically search for time travel is the belief that frantic search will not increase our chance of actually implementing time travel. That is, I’m thinking that this is a physics / meta-research question, rather than a decision-theory question.
True, but acknowledging that backward time travel is valuable (the future-lightcone of past-you has a greater volume than the future-lightcone of current-you) isn’t the same as having a time-based discount rate; the former is based on instrumental reasoning, whereas the latter is intrinsic.
I suppose I’ve always seen time-based discount rates as a shorthand to encode the instrumental consequences of the past occurring before the future. Resources can be compounded, external resources can also compound, and memories seem like a special class of resources worth paying extra attention to.
So, that is, even if you deleted the discount rate of x% a year for having resources in whatever year, you would still have to address the hypothesis that the way to have as much legally acquired money as possible a year from today is to transport your seed resources as far back in time as possible.
Right. I think that dissolves our disagreement, then.
Yes, a discount rate increases the value of time travel quite dramatically.
A big problem is then the nature of the time travel. If your AI just goes back into an alternate past where nothing it does can affect you, then I guess that’s nice for alternate-world people, assuming they appreciate your meddling—but you don’t get to experience any of it.
This is probably not what you intended.
And that should be carved in crystal where all AI ideas are discussed...
Right, but the EV is so massive that it implies you should study physics 24⁄7 just to be sure you are correctly ruling it out.
Disagreed. The value of a successful discovery is probably more immense than any other value, but that doesn’t imply that the value of the marginal hour studying physics is more positive than the next best option, i.e. that the expected value is massive. The probability you need to multiply is not that it’s possible for someone eventually, but that you will discover it now / move its discovery closer to now, which could be done by doing things besides studying physics. That is, I think that getting the meta-research question right solves this problem.
It also means that discovering the universe is older than we currently expect ould significantly raise the EV of such research. Any probability of non-finite history could cause the EV to blow up.
That gets into Pascal’s Mugging territory, I think.
I don’t think so—an important part of Pascal’s Mugging is that the demon acts second—you produce a joint probability and utility function, and then he exploits the fact that the former doesn’t fall as fast as the latter rises.
Time travel is about the worst possible example to discuss discount rates and future preferences. Your statements about what you want from an agent WRT to past, current, and future desires pretty much collapse if time travel exists, along with the commonsense definitions of the words “past, current, and future”.
Additionally, 0.1% is way too high for the probability that significant agent-level time-travel exists in our universe. Like hundreds (or more) of orders of magnitude too high. It’s quite correct for me to say 0% is the probability I assign to it, as that’s what it is, to any reasonable rounding precision.
I’d like to hear more about how you think discounting should work in a rational agent, on more conventional topics than time travel.
I tend to think of utility as purely an instantaneous decision-making construct. For me, it’s non-comparable across agents AND across time for an agent (because I don’t have a good theory of agent identity over time, and because it’s not necessary for decision-making). For me, utility is purely the evaluation of the potential future gameboard (universe) conditional on a choice under consideration.
Utility can’t be stored, and gets re-evaluated for each decision. Memory and expectation, of course are stored and continue forward, but that’s not utility, that’s universe state.
Discounting works by the agent counting on less utility for rewards that come further away from the decision/evaluation point. I think it’s strictly a heuristic—useful to estimate uncertainty about the future state of the agent (and the rest of the universe) when the agent can’t calculate very precisely.
In any case, I’m pretty sure discounting is about the amount of utility for a given future material gain, not about the amount of utility over time.
It’s also my belief that self-modifying rational agents will correct their discounting pretty rapidly for cases where it doesn’t optimize their goal achievement. Even in humans, you see this routinely: it only takes a little education for most investors to increase their time horizons (i.e. reduce their discount rate for money) by 10-100 times.
The one person I asked—Anders Sandberg—gave 1% as his first estimate. But for most low probabilities, exponential shrinkage will eventually chew up the difference. A 100 orders of magnitude—what’s that, an extra 10,000 years?
I don’t think discounting should be used at all, and that rational facts about the past and future (eg expected future wealth) should be used to get discount-like effects instead.
However, there are certain agent designs (AIXI, unbounded utility maximisers, etc...) that might need discounting as a practical tool. In those cases, adding this hack could allow them to discount while reducing the negative effects.
Depends. Utility that sums (eg total hedonistic utilitarianism, reward-agent made into a utility maximiser, etc...) does accumulate. Some other variants have utility that accumulates non-linearly. Many non-accumulating utilities might have an accumulating component.