The purest form of the exploration-exploitation balancing act is the k-armed bandit. If the OP is looking for an abstract treatment of this topic, that’s the way to go.
There is a very clean solution to the k-armed bandit problem. With small probability pick a completely random lever. Otherwise, consider the average payoff from each lever during these randomized exploration steps, and pick each lever with a probability which depends exponentially on the average payoff.
No solution does asymptotically better in the worst case.
In real life, being worst-case unbeatable is not the only criterion for success! But of course I agree that k-armed bandits don’t answer the OP’s real-life question.
These ideas are studied in AI theory. “Making mistakes” is called exploration and utilizing existing knowledge is exploitation. I am not familiar with the details or the math of this concept. A quick search finds http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.85.8423&rep=rep1&type=pdf .
Yes, this is the right connection to make.
The purest form of the exploration-exploitation balancing act is the k-armed bandit. If the OP is looking for an abstract treatment of this topic, that’s the way to go.
http://en.wikipedia.org/wiki/Multi-armed_bandit
There is a very clean solution to the k-armed bandit problem. With small probability pick a completely random lever. Otherwise, consider the average payoff from each lever during these randomized exploration steps, and pick each lever with a probability which depends exponentially on the average payoff.
No solution does asymptotically better in the worst case.
Its not a very good model for real life :)
In real life, being worst-case unbeatable is not the only criterion for success! But of course I agree that k-armed bandits don’t answer the OP’s real-life question.