Nice try, but that’s a transparent attempt to reveal me to be an unfriendly agent who’s thought through how to do that.
In general terms though, you follow the pattern of selling an agent things that favor working now whenever they decide they should work now, and the opposite trade at any moment they decide they should work later.
Or just do nothing and watch the agent never work and regret it.
But that is not an exploit. You buy bucket and chamois from the agent who is procrastinating about washing his car. You plan to sell them back at a profit when the agent decides to work now, but that never happens.
I think that you are right that exponential discounting still allows the paradox of the agent who desires to do X someday, but will never get to the point where he desires to do X today. We need to add another “axiom of rationality” to forbid that. Exponential discounting is not enough. But that doesn’t necessarily mean that there is an exploit there.
Or just do nothing and watch the agent never work and regret it.
I don’t think the agent ever gets to the point of regretting never having worked, because the warm-and-fuzzy arising from the intention to work persists.
That exploit works against a hyperbolic discounter who today wants to work tomorrow, but tomorrow doesn’t want to work today.
It doesn’t work against Clippy’s example of an exponential discounter who doesn’t want to work today and knows that tomorrow he still won’t want to work today, but still claims to want to work someday, even though he can’t say when.
Our agent cannot reason from “I want to work someday” to “There exists a day in the finitely distant future when I will want to work”. He is missing some kind of reverse induction axiom. We agree that there is something wrong with this agent’s thinking.
Interestingly, Peano arithmetic has the same “problem.” This isn’t directly relevant, but it does very strongly suggest that there is no possible way to exploit this flaw.
Suppose I take some program which looks really complicated to PA. In particular, the program runs indefinitely but PA can’t prove it. Then for every particular amount of time, PA can prove that the program hasn’t yet stopped. But there are models of PA where it is nevertheless true that “There exists a time at which the program has stopped.” It is intuitively like having two sets of integers. The normal integers, obtained from 0 by adding 1 a finite number of times, and the really large integers, obtained from the halting time of your program by adding or subtracting 1 a finite number of times. There is no way to get from one to the other, because the really large integers are just that large.
If you use to ZFC instead, you encounter significantly less intuitive versions of this strange behavior.
In our case, this would be like believing in a hypothetical future time where you will do work, but which can never be accessed by letting the days pass one by one.
It seems like “I want to work someday” is almost not the kind of statement we should use in describing people’s desires at all. It doesn’t actually say anything about how you’d respond to any choices. If it did you could find a way to dutchbook.
I think you are partially correct in that the problem is ambiguous with respect to some deciding factors—specfically, the agent’s inferential capabilities—and that there are disambiguations that make your method work. See my reply to User:Perplexed.
It doesn’t work against Clippy’s example of an exponential discounter who doesn’t want to work today and knows that tomorrow he still won’t want to work today, but still claims to want to work someday, even though he can’t say when.
Almost. It depends on the agent’s computational abilities. From the criteria I specified, it is unclear whether the agent realizes that tomorrow its decision theory will output the same action every day (i.e. that it recognizes the symmetry between today and tomorrow under the current decision theory).
If you assume the agent correctly infers that its current decision theory will lead it to perpetually defer work, then it will recognize that the outcome is suboptimal and search for a better decision theory. However, if the agent is unable to reach sufficient (correct) logical certainty about tomorrow’s action, then it is vulnerable to the money pump that User:Will_Sawin described.
I was working from the assumption that the agent is able to recognize the symmetry with future actions and so did not consider the money pump that User:Will_Sawin described. Such an agent is still, in theory, exploitable, because (under my assumptions about how such an agent could fail), the agent will sometimes conclude that it ought to work, and sometimes that it ought not, with the money-pumper profiting from the (statistically) predictable shifts.
Even so, that would require that the agent I specified use one more predicate in its decision theory—some source of randomness.
Point conceded: inconsistent preferences do not imply a practical exploitable attack vector (aka “money/paperclip pump”). However, it is common in game-theoretical discussions for e.g. intransitive preferences to not actually hurt agents that hold them, and yet the inconsistency is treated as if it opened the agent to paperclip pumping.
For example, in the Allais problem, people have intransitive preferences, and Editor:Eliezer_Yudkowsky has specified exactly how you would money-pump such a person. Yet it requires very contorted, atypical situations to actually perform the money pump.
Nice try, but that’s a transparent attempt to reveal me to be an unfriendly agent who’s thought through how to do that.
In general terms though, you follow the pattern of selling an agent things that favor working now whenever they decide they should work now, and the opposite trade at any moment they decide they should work later.
Or just do nothing and watch the agent never work and regret it.
But that is not an exploit. You buy bucket and chamois from the agent who is procrastinating about washing his car. You plan to sell them back at a profit when the agent decides to work now, but that never happens.
I think that you are right that exponential discounting still allows the paradox of the agent who desires to do X someday, but will never get to the point where he desires to do X today. We need to add another “axiom of rationality” to forbid that. Exponential discounting is not enough. But that doesn’t necessarily mean that there is an exploit there.
I don’t think the agent ever gets to the point of regretting never having worked, because the warm-and-fuzzy arising from the intention to work persists.
If they don’t know that they are irrational in this manner:
“I’ll give you tools when you need them / money when you work if you pay me now”
“OK, I’ll work tomorrow, so that’s a good deal”
“You never worked, so I got free money.
If they know they are irrational:
“I’ll act as a commitment mechanism. Sign this contract saying you’ll pay me if you don’t work.”
“This benefits me. OK.”
“I’ll relax your commitment for you so you don’t have to work. You still have to pay me some, though.”
“This benefits me, I really don’t want to work right now.”
There is ALWAYS a way.
That exploit works against a hyperbolic discounter who today wants to work tomorrow, but tomorrow doesn’t want to work today.
It doesn’t work against Clippy’s example of an exponential discounter who doesn’t want to work today and knows that tomorrow he still won’t want to work today, but still claims to want to work someday, even though he can’t say when.
Our agent cannot reason from “I want to work someday” to “There exists a day in the finitely distant future when I will want to work”. He is missing some kind of reverse induction axiom. We agree that there is something wrong with this agent’s thinking.
But, I don’t see how to exploit that flaw.
Interestingly, Peano arithmetic has the same “problem.” This isn’t directly relevant, but it does very strongly suggest that there is no possible way to exploit this flaw.
Suppose I take some program which looks really complicated to PA. In particular, the program runs indefinitely but PA can’t prove it. Then for every particular amount of time, PA can prove that the program hasn’t yet stopped. But there are models of PA where it is nevertheless true that “There exists a time at which the program has stopped.” It is intuitively like having two sets of integers. The normal integers, obtained from 0 by adding 1 a finite number of times, and the really large integers, obtained from the halting time of your program by adding or subtracting 1 a finite number of times. There is no way to get from one to the other, because the really large integers are just that large.
If you use to ZFC instead, you encounter significantly less intuitive versions of this strange behavior.
In our case, this would be like believing in a hypothetical future time where you will do work, but which can never be accessed by letting the days pass one by one.
Correct, I’m wrong.
It seems like “I want to work someday” is almost not the kind of statement we should use in describing people’s desires at all. It doesn’t actually say anything about how you’d respond to any choices. If it did you could find a way to dutchbook.
I think you are partially correct in that the problem is ambiguous with respect to some deciding factors—specfically, the agent’s inferential capabilities—and that there are disambiguations that make your method work. See my reply to User:Perplexed.
100% agreement.
Almost. It depends on the agent’s computational abilities. From the criteria I specified, it is unclear whether the agent realizes that tomorrow its decision theory will output the same action every day (i.e. that it recognizes the symmetry between today and tomorrow under the current decision theory).
If you assume the agent correctly infers that its current decision theory will lead it to perpetually defer work, then it will recognize that the outcome is suboptimal and search for a better decision theory. However, if the agent is unable to reach sufficient (correct) logical certainty about tomorrow’s action, then it is vulnerable to the money pump that User:Will_Sawin described.
I was working from the assumption that the agent is able to recognize the symmetry with future actions and so did not consider the money pump that User:Will_Sawin described. Such an agent is still, in theory, exploitable, because (under my assumptions about how such an agent could fail), the agent will sometimes conclude that it ought to work, and sometimes that it ought not, with the money-pumper profiting from the (statistically) predictable shifts.
Even so, that would require that the agent I specified use one more predicate in its decision theory—some source of randomness.
Point conceded: inconsistent preferences do not imply a practical exploitable attack vector (aka “money/paperclip pump”). However, it is common in game-theoretical discussions for e.g. intransitive preferences to not actually hurt agents that hold them, and yet the inconsistency is treated as if it opened the agent to paperclip pumping.
For example, in the Allais problem, people have intransitive preferences, and Editor:Eliezer_Yudkowsky has specified exactly how you would money-pump such a person. Yet it requires very contorted, atypical situations to actually perform the money pump.