I think one very important thing you are pointing out is that I did not mention the impact of the environment. Because to train using RL, there must be some underlying environment, even just as a sample model. This opens up a lot of questions:
What happens if the actual environment is known by the RL process and the system whose focus we are computing?
What happens when there is uncertainty over the environment?
Given an environment, from which goals is focus entangled (your example basically: high focus with one imply high focus with the other)?
As for your specific example, I assume that the distance converges to 0 because intuitively the only difference lies in the action at state s_k (go back to 0 for the first reward and increment for the second), and this state is seen in less and less proportion as N goes to infinity.
This seems like a perfect example of two distinct goals with almost maximal focus, and similar triviality. As mentioned in the post, I don’t have a clear cut intuition on what to do here. I would say that we cannot distinguish between the two goals in terms of behavior, maybe.
I think one very important thing you are pointing out is that I did not mention the impact of the environment. Because to train using RL, there must be some underlying environment, even just as a sample model. This opens up a lot of questions:
What happens if the actual environment is known by the RL process and the system whose focus we are computing?
What happens when there is uncertainty over the environment?
Given an environment, from which goals is focus entangled (your example basically: high focus with one imply high focus with the other)?
As for your specific example, I assume that the distance converges to 0 because intuitively the only difference lies in the action at state s_k (go back to 0 for the first reward and increment for the second), and this state is seen in less and less proportion as N goes to infinity.
This seems like a perfect example of two distinct goals with almost maximal focus, and similar triviality. As mentioned in the post, I don’t have a clear cut intuition on what to do here. I would say that we cannot distinguish between the two goals in terms of behavior, maybe.