and it’s looking like the only way to solve it is by forcing ϵ-exploration.
That only works in the limit, not in practice, right? Or is there a way to make it work in practice, e.g. by having the agent try to approximate the limit, and reason about what would happen in the limit?
I think you’re right, see my other short comments below about epsilon-exploration as a realistic solution. It’s conceivable that something like “epsilon-exploration plus heuristics on top” groks enough regularities that performance at some finite time tends to be good. But who knows how likely that is.
That only works in the limit, not in practice, right? Or is there a way to make it work in practice, e.g. by having the agent try to approximate the limit, and reason about what would happen in the limit?
It just takes a very long time in practice, see “Basins of Attraction” by Ellison.
I think you’re right, see my other short comments below about epsilon-exploration as a realistic solution. It’s conceivable that something like “epsilon-exploration plus heuristics on top” groks enough regularities that performance at some finite time tends to be good. But who knows how likely that is.