I meant averaging over the possible ways that the environment could change following your exploitation. For example, it’s possible that a particular course of exploitation action could shape the environment such that your exploitation strategy actually becomes more valuable upon each iteration. In such a scenario, exploring more after exploiting would be an especially bad decision. So I don’t think I can accept “will” without “on average” unless “all else” excludes all of these types of scenarios in which exploring is harmful.
Hm. I expect that within the set of environments where exploitation can alter the results of what-to-exploit-next calculations, there more possible ways for it to do so such that the right move in the next iteration is further exploration than further exploitation.
So, yeah, I’ll accept “will get you better results on average.”
Would you accept “will get you better results, all else being equal” instead? I don’t have a very clear sense of what we’d be averaging.
I meant averaging over the possible ways that the environment could change following your exploitation. For example, it’s possible that a particular course of exploitation action could shape the environment such that your exploitation strategy actually becomes more valuable upon each iteration. In such a scenario, exploring more after exploiting would be an especially bad decision. So I don’t think I can accept “will” without “on average” unless “all else” excludes all of these types of scenarios in which exploring is harmful.
OK, understood. Thanks for clarifying.
Hm. I expect that within the set of environments where exploitation can alter the results of what-to-exploit-next calculations, there more possible ways for it to do so such that the right move in the next iteration is further exploration than further exploitation.
So, yeah, I’ll accept “will get you better results on average.”