Indeed. And that’s what happens when you give a maximiser perverse incentives and infinity in which to gain them.
This scenario corresponds precisely to pseudocode of the kind
newval<-1
oldval<-0
while newval>oldval
{
oldval<-newval
newval<-newval+1
}
Which never terminates. This is only irrational if you want to terminate (which you usually do), but again, the claim that the maximiser never obtains value doesn’t matter because you are essentially placing an outside judgment on the system.
Basically, what I believe you (and the op) are doing is looking at two agents in the numberverse.
Agent one stops at time 100 and gains X utility
Agent two continues forever and never gains any utility.
Clearly, you think, agent one has “won”. But how? Agent two has never failed. The numberverse is eternal, so there is no point at which you can say it has “lost” to agent one. If the numberverse had a non zero probability of collapsing at any point in time then Agent two’s strategy would instead be more complex (and possibly uncomputable if we distribute over infinity), but as we are told that agent one and two exist in a changeless universe and their only goal is to obtain the most utility then we can’t judge either to have won. In fact agent two’s strategy only prevents it from losing, and it can’t win.
That is, if we imagine the numberverse full of agents, any agent which chooses to stop will lose in a contest of utility, because the remaining agents can always choose to stop and obtain their far greater utility. So the rational thing to do in this contest is to never stop.
Sure, that’s a pretty bleak lookout, but as I say, if you make a situation artificial enough you get artificial outcomes.
What you are saying would be optimising in a universe where the agent gets the utility as it says the number. Then the average utility of a ungoer would be greater than that of a idler.
However if the utility is dished out after the number has been spesified then an idler and a ongoer have exactly the same amount of utility and ought to be as optimal. 0 is not a optimum of this game so an agent that results in 0 utility is not an optimiser. If you take an agent that is an optimiser in other context then it ofcourse might not be an optimiser for this game.
There is also the problem that choosing the continue doesn’t yield the utilty with certainty only “almost always”. The ongoer strategy hits precicely in the hole in this certainty when no payout happens. I guess you may be able to define a game where concurrently with their actions. But this reeks of “the house” having premonition on what the agent is going to do instead of inferring its from its actions. if the rules are “first actions and THEN payout” you need to be able to do your action to get a payout.
In the ongoing version I could think of rules that an agent that has said “9.9999...” to 400 digits would receive 0.000.(401 zeroes)..9 utility on the next digit. However if the agents get utility assigned only once there won’t be a “standing so far”. However this behaviour would then be the perfectly rational thing to do as there would be a uniquely determined digit to keep on saying. I am suspecting the trouble is mixing the ongoing and the dispatch version to each other inconsistently.
“However if the utility is dished out after the number has been spesified then an idler and a ongoer have exactly the same amount of utility and ought to be as optimal. 0 is not a optimum of this game so an agent that results in 0 utility is not an optimiser. If you take an agent that is an optimiser in other context then it ofcourse might not be an optimiser for this game.”
The problem with this logic is the assumption that there is a “result” of 0. While it’s certainly true that an “idler” will obtain an actual value at some point, so we can assess how they have done, there will never be a point in time that we can assess the ongoer. If we change the criteria and say that we are going to assess at a point in time then the ongoer can simply stop then and obtain the highest possible utility. But time never ends, and we never mark the ongoer’s homework, so to say he has a utility of 0 at the end is nonsense, because there is, by definition, no end to this scenario.
Essentially, if you include infinity in a maximisation scenario, expect odd results.
Indeed. And that’s what happens when you give a maximiser perverse incentives and infinity in which to gain them.
This scenario corresponds precisely to pseudocode of the kind
newval<-1
oldval<-0
while newval>oldval
{
oldval<-newval
newval<-newval+1
}
Which never terminates. This is only irrational if you want to terminate (which you usually do), but again, the claim that the maximiser never obtains value doesn’t matter because you are essentially placing an outside judgment on the system.
Basically, what I believe you (and the op) are doing is looking at two agents in the numberverse.
Agent one stops at time 100 and gains X utility Agent two continues forever and never gains any utility.
Clearly, you think, agent one has “won”. But how? Agent two has never failed. The numberverse is eternal, so there is no point at which you can say it has “lost” to agent one. If the numberverse had a non zero probability of collapsing at any point in time then Agent two’s strategy would instead be more complex (and possibly uncomputable if we distribute over infinity), but as we are told that agent one and two exist in a changeless universe and their only goal is to obtain the most utility then we can’t judge either to have won. In fact agent two’s strategy only prevents it from losing, and it can’t win.
That is, if we imagine the numberverse full of agents, any agent which chooses to stop will lose in a contest of utility, because the remaining agents can always choose to stop and obtain their far greater utility. So the rational thing to do in this contest is to never stop.
Sure, that’s a pretty bleak lookout, but as I say, if you make a situation artificial enough you get artificial outcomes.
What you are saying would be optimising in a universe where the agent gets the utility as it says the number. Then the average utility of a ungoer would be greater than that of a idler.
However if the utility is dished out after the number has been spesified then an idler and a ongoer have exactly the same amount of utility and ought to be as optimal. 0 is not a optimum of this game so an agent that results in 0 utility is not an optimiser. If you take an agent that is an optimiser in other context then it ofcourse might not be an optimiser for this game.
There is also the problem that choosing the continue doesn’t yield the utilty with certainty only “almost always”. The ongoer strategy hits precicely in the hole in this certainty when no payout happens. I guess you may be able to define a game where concurrently with their actions. But this reeks of “the house” having premonition on what the agent is going to do instead of inferring its from its actions. if the rules are “first actions and THEN payout” you need to be able to do your action to get a payout.
In the ongoing version I could think of rules that an agent that has said “9.9999...” to 400 digits would receive 0.000.(401 zeroes)..9 utility on the next digit. However if the agents get utility assigned only once there won’t be a “standing so far”. However this behaviour would then be the perfectly rational thing to do as there would be a uniquely determined digit to keep on saying. I am suspecting the trouble is mixing the ongoing and the dispatch version to each other inconsistently.
“However if the utility is dished out after the number has been spesified then an idler and a ongoer have exactly the same amount of utility and ought to be as optimal. 0 is not a optimum of this game so an agent that results in 0 utility is not an optimiser. If you take an agent that is an optimiser in other context then it ofcourse might not be an optimiser for this game.”
The problem with this logic is the assumption that there is a “result” of 0. While it’s certainly true that an “idler” will obtain an actual value at some point, so we can assess how they have done, there will never be a point in time that we can assess the ongoer. If we change the criteria and say that we are going to assess at a point in time then the ongoer can simply stop then and obtain the highest possible utility. But time never ends, and we never mark the ongoer’s homework, so to say he has a utility of 0 at the end is nonsense, because there is, by definition, no end to this scenario.
Essentially, if you include infinity in a maximisation scenario, expect odd results.