Then the “rational” thing is to never stop speaking. It’s true that by never stopping speaking I’ll never gain utility but by stopping speaking early I miss out on future utility.
The behaviour of speaking forever seems irrational, but you have deliberately crafted a scenario where my only goal is to get the highest possible utility, and the only way to do that is to just keep speaking. If you suggest that someone who got some utility after 1 million years is “more rational” than someone still speaking at 1 billion years then you are adding a value judgment not apparent in the original scenario.
Infinite utility is not a possible utility in the scenario and therefore the behaviour of not stopping is not a highest possible utility. Continue to speak is an improvement only given that you do stop at some time. If you continue by not stopping ever you get 0 utility which is lower than speaking a 2 digit number.
But time doesn’t end. The criteria of assessment is
1)I only care about getting the highest number possible
2)I am utterly indifferent to how long this takes me
3)The only way to generate this value is by speaking this number (or, at the very least, any other methods I might have used instead are compensated explicitly once I finish speaking).
If your argument is that Bob, who stopped at Grahams number, is more rational than Jim, who is still speaking, then you’ve changed the terms. If my goal is to beat Bob, then I just need to stop at Graham’s number plus one.
At any given time, t, I have no reason to stop, because I can expect to earn more by continuing. The only reason this looks irrational is we are imagining things which the scenario rules out: time costs or infinite time coming to an end.
The argument “but then you never get any utility” is true, but that doesn’t matter, because I last forever. There is no end of time in this scenario.
If your argument is that in a universe with infinite time, infinite life and a magic incentive button then all everyone will do is press that button forever then you are correct, but I don’t think you’re saying much.
does assign to utility more than once. With finite iterations these two would be quite interchangeable but with non-terminating iterations its not. The iteration doesn’t need to terminate for this to be true.
Say you are in a market and you know someone who sells wheat for $5 and someone who buys it for $10 and someone who sells wine for $7 and suppose that you care about wine. If you have a strategy that only consists of buying and selling wheat you don’t get any wine. There needs to be a “cashout” move of buying wine atleast once. Now think of a situation that when you buy wine you need to hand over your wheat dealing licence. Well a wheat licence means arbitrary amounts of wine so irrational to ever trade wheat license away for a finite amount of wine right? But then you end up with a wine “maximising strategy” that does so by not ever buying wine.
Indeed. And that’s what happens when you give a maximiser perverse incentives and infinity in which to gain them.
This scenario corresponds precisely to pseudocode of the kind
newval<-1
oldval<-0
while newval>oldval
{
oldval<-newval
newval<-newval+1
}
Which never terminates. This is only irrational if you want to terminate (which you usually do), but again, the claim that the maximiser never obtains value doesn’t matter because you are essentially placing an outside judgment on the system.
Basically, what I believe you (and the op) are doing is looking at two agents in the numberverse.
Agent one stops at time 100 and gains X utility
Agent two continues forever and never gains any utility.
Clearly, you think, agent one has “won”. But how? Agent two has never failed. The numberverse is eternal, so there is no point at which you can say it has “lost” to agent one. If the numberverse had a non zero probability of collapsing at any point in time then Agent two’s strategy would instead be more complex (and possibly uncomputable if we distribute over infinity), but as we are told that agent one and two exist in a changeless universe and their only goal is to obtain the most utility then we can’t judge either to have won. In fact agent two’s strategy only prevents it from losing, and it can’t win.
That is, if we imagine the numberverse full of agents, any agent which chooses to stop will lose in a contest of utility, because the remaining agents can always choose to stop and obtain their far greater utility. So the rational thing to do in this contest is to never stop.
Sure, that’s a pretty bleak lookout, but as I say, if you make a situation artificial enough you get artificial outcomes.
What you are saying would be optimising in a universe where the agent gets the utility as it says the number. Then the average utility of a ungoer would be greater than that of a idler.
However if the utility is dished out after the number has been spesified then an idler and a ongoer have exactly the same amount of utility and ought to be as optimal. 0 is not a optimum of this game so an agent that results in 0 utility is not an optimiser. If you take an agent that is an optimiser in other context then it ofcourse might not be an optimiser for this game.
There is also the problem that choosing the continue doesn’t yield the utilty with certainty only “almost always”. The ongoer strategy hits precicely in the hole in this certainty when no payout happens. I guess you may be able to define a game where concurrently with their actions. But this reeks of “the house” having premonition on what the agent is going to do instead of inferring its from its actions. if the rules are “first actions and THEN payout” you need to be able to do your action to get a payout.
In the ongoing version I could think of rules that an agent that has said “9.9999...” to 400 digits would receive 0.000.(401 zeroes)..9 utility on the next digit. However if the agents get utility assigned only once there won’t be a “standing so far”. However this behaviour would then be the perfectly rational thing to do as there would be a uniquely determined digit to keep on saying. I am suspecting the trouble is mixing the ongoing and the dispatch version to each other inconsistently.
“However if the utility is dished out after the number has been spesified then an idler and a ongoer have exactly the same amount of utility and ought to be as optimal. 0 is not a optimum of this game so an agent that results in 0 utility is not an optimiser. If you take an agent that is an optimiser in other context then it ofcourse might not be an optimiser for this game.”
The problem with this logic is the assumption that there is a “result” of 0. While it’s certainly true that an “idler” will obtain an actual value at some point, so we can assess how they have done, there will never be a point in time that we can assess the ongoer. If we change the criteria and say that we are going to assess at a point in time then the ongoer can simply stop then and obtain the highest possible utility. But time never ends, and we never mark the ongoer’s homework, so to say he has a utility of 0 at the end is nonsense, because there is, by definition, no end to this scenario.
Essentially, if you include infinity in a maximisation scenario, expect odd results.
Then the “rational” thing is to never stop speaking. It’s true that by never stopping speaking I’ll never gain utility but by stopping speaking early I miss out on future utility.
The behaviour of speaking forever seems irrational, but you have deliberately crafted a scenario where my only goal is to get the highest possible utility, and the only way to do that is to just keep speaking. If you suggest that someone who got some utility after 1 million years is “more rational” than someone still speaking at 1 billion years then you are adding a value judgment not apparent in the original scenario.
Infinite utility is not a possible utility in the scenario and therefore the behaviour of not stopping is not a highest possible utility. Continue to speak is an improvement only given that you do stop at some time. If you continue by not stopping ever you get 0 utility which is lower than speaking a 2 digit number.
But time doesn’t end. The criteria of assessment is
1)I only care about getting the highest number possible
2)I am utterly indifferent to how long this takes me
3)The only way to generate this value is by speaking this number (or, at the very least, any other methods I might have used instead are compensated explicitly once I finish speaking).
If your argument is that Bob, who stopped at Grahams number, is more rational than Jim, who is still speaking, then you’ve changed the terms. If my goal is to beat Bob, then I just need to stop at Graham’s number plus one.
At any given time, t, I have no reason to stop, because I can expect to earn more by continuing. The only reason this looks irrational is we are imagining things which the scenario rules out: time costs or infinite time coming to an end.
The argument “but then you never get any utility” is true, but that doesn’t matter, because I last forever. There is no end of time in this scenario.
If your argument is that in a universe with infinite time, infinite life and a magic incentive button then all everyone will do is press that button forever then you are correct, but I don’t think you’re saying much.
python code of
doesn’t generate a runtime exception when ran
similiarly
doesn’t assign to utility more than once
in contrast
does assign to utility more than once. With finite iterations these two would be quite interchangeable but with non-terminating iterations its not. The iteration doesn’t need to terminate for this to be true.
Say you are in a market and you know someone who sells wheat for $5 and someone who buys it for $10 and someone who sells wine for $7 and suppose that you care about wine. If you have a strategy that only consists of buying and selling wheat you don’t get any wine. There needs to be a “cashout” move of buying wine atleast once. Now think of a situation that when you buy wine you need to hand over your wheat dealing licence. Well a wheat licence means arbitrary amounts of wine so irrational to ever trade wheat license away for a finite amount of wine right? But then you end up with a wine “maximising strategy” that does so by not ever buying wine.
Indeed. And that’s what happens when you give a maximiser perverse incentives and infinity in which to gain them.
This scenario corresponds precisely to pseudocode of the kind
newval<-1
oldval<-0
while newval>oldval
{
oldval<-newval
newval<-newval+1
}
Which never terminates. This is only irrational if you want to terminate (which you usually do), but again, the claim that the maximiser never obtains value doesn’t matter because you are essentially placing an outside judgment on the system.
Basically, what I believe you (and the op) are doing is looking at two agents in the numberverse.
Agent one stops at time 100 and gains X utility Agent two continues forever and never gains any utility.
Clearly, you think, agent one has “won”. But how? Agent two has never failed. The numberverse is eternal, so there is no point at which you can say it has “lost” to agent one. If the numberverse had a non zero probability of collapsing at any point in time then Agent two’s strategy would instead be more complex (and possibly uncomputable if we distribute over infinity), but as we are told that agent one and two exist in a changeless universe and their only goal is to obtain the most utility then we can’t judge either to have won. In fact agent two’s strategy only prevents it from losing, and it can’t win.
That is, if we imagine the numberverse full of agents, any agent which chooses to stop will lose in a contest of utility, because the remaining agents can always choose to stop and obtain their far greater utility. So the rational thing to do in this contest is to never stop.
Sure, that’s a pretty bleak lookout, but as I say, if you make a situation artificial enough you get artificial outcomes.
What you are saying would be optimising in a universe where the agent gets the utility as it says the number. Then the average utility of a ungoer would be greater than that of a idler.
However if the utility is dished out after the number has been spesified then an idler and a ongoer have exactly the same amount of utility and ought to be as optimal. 0 is not a optimum of this game so an agent that results in 0 utility is not an optimiser. If you take an agent that is an optimiser in other context then it ofcourse might not be an optimiser for this game.
There is also the problem that choosing the continue doesn’t yield the utilty with certainty only “almost always”. The ongoer strategy hits precicely in the hole in this certainty when no payout happens. I guess you may be able to define a game where concurrently with their actions. But this reeks of “the house” having premonition on what the agent is going to do instead of inferring its from its actions. if the rules are “first actions and THEN payout” you need to be able to do your action to get a payout.
In the ongoing version I could think of rules that an agent that has said “9.9999...” to 400 digits would receive 0.000.(401 zeroes)..9 utility on the next digit. However if the agents get utility assigned only once there won’t be a “standing so far”. However this behaviour would then be the perfectly rational thing to do as there would be a uniquely determined digit to keep on saying. I am suspecting the trouble is mixing the ongoing and the dispatch version to each other inconsistently.
“However if the utility is dished out after the number has been spesified then an idler and a ongoer have exactly the same amount of utility and ought to be as optimal. 0 is not a optimum of this game so an agent that results in 0 utility is not an optimiser. If you take an agent that is an optimiser in other context then it ofcourse might not be an optimiser for this game.”
The problem with this logic is the assumption that there is a “result” of 0. While it’s certainly true that an “idler” will obtain an actual value at some point, so we can assess how they have done, there will never be a point in time that we can assess the ongoer. If we change the criteria and say that we are going to assess at a point in time then the ongoer can simply stop then and obtain the highest possible utility. But time never ends, and we never mark the ongoer’s homework, so to say he has a utility of 0 at the end is nonsense, because there is, by definition, no end to this scenario.
Essentially, if you include infinity in a maximisation scenario, expect odd results.