There is a very clean solution to the k-armed bandit problem. With small probability pick a completely random lever. Otherwise, consider the average payoff from each lever during these randomized exploration steps, and pick each lever with a probability which depends exponentially on the average payoff.
No solution does asymptotically better in the worst case.
In real life, being worst-case unbeatable is not the only criterion for success! But of course I agree that k-armed bandits don’t answer the OP’s real-life question.
There is a very clean solution to the k-armed bandit problem. With small probability pick a completely random lever. Otherwise, consider the average payoff from each lever during these randomized exploration steps, and pick each lever with a probability which depends exponentially on the average payoff.
No solution does asymptotically better in the worst case.
Its not a very good model for real life :)
In real life, being worst-case unbeatable is not the only criterion for success! But of course I agree that k-armed bandits don’t answer the OP’s real-life question.