Perhaps it’ll be useful to think about the question for specific D and R.
Here are the simplest D and R I can think of that might serve this purpose:
D - uniform over the integers in the range [1,101010].
R - for each input x, R assigns a reward of 1 to the smallest prime number that is larger than x, and −1 to everything else.
Perhaps it’ll be useful to think about the question for specific D and R.
Here are the simplest D and R I can think of that might serve this purpose:
D - uniform over the integers in the range [1,101010].
R - for each input x, R assigns a reward of 1 to the smallest prime number that is larger than x, and −1 to everything else.