RL usually applies some discount rate, and also caps episodes at a certain length, so that an action taken at a given time isn’t reinforced very much (or at all) for having much longer-term consequences.
How does this compare to evolution? At equilibrium, I think that a gene which increases the fitness of its bearers in N generations’ time is just as strongly favored as a gene that increases the fitness of its bearers by the same amount straightaway. As long as it was already widespread at least N generations ago, they’re basically the same thing, because current gene-holders benefit from the effects of the gene-holders from N generations ago.
That gene would evolve much more slowly, though. Plus in practice it’s hard to ensure that the benefits accrue only to gene-holders, and there’s so much variance in the environment that for N of more than 3 or 4 this seems pretty implausible. Still, the disanalogy seems kinda interesting.
RL usually applies some discount rate, and also caps episodes at a certain length, so that an action taken at a given time isn’t reinforced very much (or at all) for having much longer-term consequences.
How does this compare to evolution? At equilibrium, I think that a gene which increases the fitness of its bearers in N generations’ time is just as strongly favored as a gene that increases the fitness of its bearers by the same amount straightaway. As long as it was already widespread at least N generations ago, they’re basically the same thing, because current gene-holders benefit from the effects of the gene-holders from N generations ago.
That gene would evolve much more slowly, though. Plus in practice it’s hard to ensure that the benefits accrue only to gene-holders, and there’s so much variance in the environment that for N of more than 3 or 4 this seems pretty implausible. Still, the disanalogy seems kinda interesting.