This has me confused as well. Assume a large area divided into two regions. Region A has slot machines with average payout 50, while region B has machines with average payout 500. I am blindfolded and randomly dropped into region A or B. The first slot machine I try has payout 70. I update in the direction of being in region A. Doesn’t this affect how many resources I wish to spend doing exploration?
Are you also assuming that you know all of those assumed facts about the area?
I would certainly expect that how many resources I want to spend on exploration will be affected by how much a priori knowledge I have about the system. Without such knowledge, the amount of exploration-energy I’d have to expend to be confident that there are two regions A and B with average payout as you describe is enormous.
Do you mean to set the parameter specifying the amount of resources (e.g., time steps) to spend exploring (before switching to full-exploiting) based on the info you receive upon your first observation? Also, what do you mean by “probably”?
You should probably be prepared to change how much you plan to spend on exploring based on the initial information recieved.
This has me confused as well.
Assume a large area divided into two regions. Region A has slot machines with average payout 50, while region B has machines with average payout 500. I am blindfolded and randomly dropped into region A or B. The first slot machine I try has payout 70. I update in the direction of being in region A. Doesn’t this affect how many resources I wish to spend doing exploration?
Are you also assuming that you know all of those assumed facts about the area?
I would certainly expect that how many resources I want to spend on exploration will be affected by how much a priori knowledge I have about the system. Without such knowledge, the amount of exploration-energy I’d have to expend to be confident that there are two regions A and B with average payout as you describe is enormous.
Do you mean to set the parameter specifying the amount of resources (e.g., time steps) to spend exploring (before switching to full-exploiting) based on the info you receive upon your first observation? Also, what do you mean by “probably”?