The purest form of the exploration-exploitation balancing act is the k-armed bandit. If the OP is looking for an abstract treatment of this topic, that’s the way to go.
There is a very clean solution to the k-armed bandit problem. With small probability pick a completely random lever. Otherwise, consider the average payoff from each lever during these randomized exploration steps, and pick each lever with a probability which depends exponentially on the average payoff.
No solution does asymptotically better in the worst case.
In real life, being worst-case unbeatable is not the only criterion for success! But of course I agree that k-armed bandits don’t answer the OP’s real-life question.
Yes, this is the right connection to make.
The purest form of the exploration-exploitation balancing act is the k-armed bandit. If the OP is looking for an abstract treatment of this topic, that’s the way to go.
http://en.wikipedia.org/wiki/Multi-armed_bandit
There is a very clean solution to the k-armed bandit problem. With small probability pick a completely random lever. Otherwise, consider the average payoff from each lever during these randomized exploration steps, and pick each lever with a probability which depends exponentially on the average payoff.
No solution does asymptotically better in the worst case.
Its not a very good model for real life :)
In real life, being worst-case unbeatable is not the only criterion for success! But of course I agree that k-armed bandits don’t answer the OP’s real-life question.