Thank you for your insights! You say: ” Yes! You do have to think about the amount of games you play if your utility function is not linear”
Let’s consider the case of rational agents acting in a temporal framework where they are faced with daily decisions. If they need to consider all their future possible choices in order to decide for a single present choice then it seems they are always completely unable to make any single decision (the computation to be made seems almost never ending) and this principle of expected utility maximization would turn out to be useless. How do we make rational decisions then?
Well, if you assume these agents do not employ time-discounting then you indeed cannot compare trajectories, since all of them might have infinite utility (and are computationally intractable as you say) if they don’t terminate.
We do run into the same problem if we assume realistic action spaces, i.e. consider all the things we could possibly do, as there are too many even for a single time step.
RL algorithms “solve” this by working with constrained action spaces and discounting future utility.. and also by often having terminating trajectories.
Humans also work on (highly) constrained action spaces and have strong time discounting [citation needed], and every model of a rational human should take that into account.
I admit those points are more like hacks we’ve come up with for practical situations, but I suppose the computational intractability is a reason why we can’t already have all the nice things ;-)
Thank you for your insights! You say: ” Yes! You do have to think about the amount of games you play if your utility function is not linear”
Let’s consider the case of rational agents acting in a temporal framework where they are faced with daily decisions. If they need to consider all their future possible choices in order to decide for a single present choice then it seems they are always completely unable to make any single decision (the computation to be made seems almost never ending) and this principle of expected utility maximization would turn out to be useless. How do we make rational decisions then?
Well, if you assume these agents do not employ time-discounting then you indeed cannot compare trajectories, since all of them might have infinite utility (and are computationally intractable as you say) if they don’t terminate.
We do run into the same problem if we assume realistic action spaces, i.e. consider all the things we could possibly do, as there are too many even for a single time step.
RL algorithms “solve” this by working with constrained action spaces and discounting future utility.. and also by often having terminating trajectories. Humans also work on (highly) constrained action spaces and have strong time discounting [citation needed], and every model of a rational human should take that into account.
I admit those points are more like hacks we’ve come up with for practical situations, but I suppose the computational intractability is a reason why we can’t already have all the nice things ;-)