We failed to point out any interesting concrete priors over distributions of reward functions, but I am optimistic that it should be possible with better understanding of the metric (topological, linear, differential, …?) structure of the reward functions on an MDP.
I want to know what spark of intuition led to your optimism. The technical details didn’t feel like they contributed to conveying this intuition of yours. It would help if you gave some examples of using the metric structure of the some functions in a space to pin down some sort of probability distribution.
Hm, so one comment is that the proof in the post was not meant to convey the intuition for the existence of the concrete probability distribution—the measurability of the POWER inequality is a necessary first step, but not really technically related to the (potential) rest of the proof (although I had initially hoped that lifting some distribution on rewards by the Giry monad might produce something interesting).
As for why the additional structure might be helpful: the issue with there being no Lebesgue-like uniform measure is that in the infinite-dimensional space like [0,1]N, one cannot assign any positive measure to any subset. For example, in [0,1], each of the halves have to have equal measures, because the measure has to be shift-invariant. In [0,1]2, we can do this with each of the four squares like [0,1/2]×[0,1/2]. Repeating this process, in the limit, there is no measure we can assign to those intervals, because they can be divide into countably-many non-negligible sets (c.f. https://en.wikipedia.org/wiki/Infinite-dimensional_Lebesgue_measure).
So the problem is that first, the space is too big, and second, there is too much freedom of cutting the space into pieces and shifting them. The EPIC metric paper I linked to in the post (or some related research) might be helpful in solving both of these issues.
First, we can make the space smaller by dividing it by some equivalence relation—reward shaping properties in MDPs provide such relation (although EPIC considers the relation from the original paper by Ng et al which is too weak—something stronger is needed). To give a concrete (although a bit silly) example: there is no uniform measure on the space of real-valued functions on [0,1]. But suppose we have a (very strong) equivalence relation f∼g iff f(0)=g(0). Then, the space collapses to just R, which has a normal λ measure.
The second problem is that we had too much freedom in shifting the subsets (or, the shift-invariance was too strong). In our case, “shifting” is applied to the sets of probability distributions of rewards. But individual rewards cannot always be shifted, since this operation doesn’t preserve optimal policies. So maybe this puts some restrictions on the transformations we can apply to the space, and the measures don’t blow up.
So, briefly, I don’t understand those behaviours very well yet, but my intuitive optimism comes from:
first, the fact that of rewards seems to be rich, so if the space of distributions of rewards inherits some of the properties, the induced symmetries would limit the allowed transformations
second, there is another approach I which forgot to write about in the post, which is to consider non-shift-invariant uninformative priors—for example, Jeffrey prior on [0,1] is not the uniform distribution. It seems that the problem we are dealing with here is quite common in math, and people have invented workarounds (like the abstract Wiener spaces mentioned in the wikipedia article) - the issue is checking whether any of those workaround applies here
Thank you for writing this, I feel like it makes the core idea you’re expressing at much clearer.
My intuition is that abstract Wiener spaces won’t get you the sort of measure you’re looking for alone, based off my experience with measures over big spaces in physics. But, that said, I feel like there should be some such measure over large physical spaces, as presumably power has a definition in terms of physical concepts, or else how the heck can we recover our intuition of power in our world? It should all add up to normality, after all. It seems to me that looking over those physics papers which descibed single particles as agentic because our distributions over them tend towards max entropy, which we can view as the particle seeking the greatest “option value” it can, would be a good place to build up the latter intuition.
I think I am undecided as to whether you can use the rich structre of reward functions to limit the allowed transormations in a useful way. Partly because I suspect that this rich structure reflects a physical structure (something like the natural abstractions thesis + selection pressure from reality for the sorts of rewards we typically see) or perhaps a simplicity prior of some sort. But maybe it will work out. I don’t know.
My lack of optimism as to the possibility of your agenda is basically why I was willing to accept the strange probability distribution TurnTrout went with, I guess. But on reflection, perhaps I should have used that as an existence proof of distribution over reward which allows something like our intuitive picture of power seeking. And tried to see if I could interpret it to be something less weird, use it to find something less weird, or just go look for less weird things because maybe they’d work.
Sorry for the long post, but I just realized I didn’t update based off Turntrout’s results. It seems more likely to me now that your agenda might work. Though I’d be more optimistic if you were using Turntrout’s distribution as inspiration for what to look for in some way.
I want to know what spark of intuition led to your optimism. The technical details didn’t feel like they contributed to conveying this intuition of yours. It would help if you gave some examples of using the metric structure of the some functions in a space to pin down some sort of probability distribution.
Hm, so one comment is that the proof in the post was not meant to convey the intuition for the existence of the concrete probability distribution—the measurability of the POWER inequality is a necessary first step, but not really technically related to the (potential) rest of the proof (although I had initially hoped that lifting some distribution on rewards by the Giry monad might produce something interesting).
As for why the additional structure might be helpful: the issue with there being no Lebesgue-like uniform measure is that in the infinite-dimensional space like [0,1]N, one cannot assign any positive measure to any subset. For example, in [0,1], each of the halves have to have equal measures, because the measure has to be shift-invariant. In [0,1]2, we can do this with each of the four squares like [0,1/2]×[0,1/2]. Repeating this process, in the limit, there is no measure we can assign to those intervals, because they can be divide into countably-many non-negligible sets (c.f. https://en.wikipedia.org/wiki/Infinite-dimensional_Lebesgue_measure).
So the problem is that first, the space is too big, and second, there is too much freedom of cutting the space into pieces and shifting them. The EPIC metric paper I linked to in the post (or some related research) might be helpful in solving both of these issues.
First, we can make the space smaller by dividing it by some equivalence relation—reward shaping properties in MDPs provide such relation (although EPIC considers the relation from the original paper by Ng et al which is too weak—something stronger is needed). To give a concrete (although a bit silly) example: there is no uniform measure on the space of real-valued functions on [0,1]. But suppose we have a (very strong) equivalence relation f∼g iff f(0)=g(0). Then, the space collapses to just R, which has a normal λ measure.
The second problem is that we had too much freedom in shifting the subsets (or, the shift-invariance was too strong). In our case, “shifting” is applied to the sets of probability distributions of rewards. But individual rewards cannot always be shifted, since this operation doesn’t preserve optimal policies. So maybe this puts some restrictions on the transformations we can apply to the space, and the measures don’t blow up.
So, briefly, I don’t understand those behaviours very well yet, but my intuitive optimism comes from:
first, the fact that of rewards seems to be rich, so if the space of distributions of rewards inherits some of the properties, the induced symmetries would limit the allowed transformations
second, there is another approach I which forgot to write about in the post, which is to consider non-shift-invariant uninformative priors—for example, Jeffrey prior on [0,1] is not the uniform distribution. It seems that the problem we are dealing with here is quite common in math, and people have invented workarounds (like the abstract Wiener spaces mentioned in the wikipedia article) - the issue is checking whether any of those workaround applies here
Thank you for writing this, I feel like it makes the core idea you’re expressing at much clearer.
My intuition is that abstract Wiener spaces won’t get you the sort of measure you’re looking for alone, based off my experience with measures over big spaces in physics. But, that said, I feel like there should be some such measure over large physical spaces, as presumably power has a definition in terms of physical concepts, or else how the heck can we recover our intuition of power in our world? It should all add up to normality, after all. It seems to me that looking over those physics papers which descibed single particles as agentic because our distributions over them tend towards max entropy, which we can view as the particle seeking the greatest “option value” it can, would be a good place to build up the latter intuition.
I think I am undecided as to whether you can use the rich structre of reward functions to limit the allowed transormations in a useful way. Partly because I suspect that this rich structure reflects a physical structure (something like the natural abstractions thesis + selection pressure from reality for the sorts of rewards we typically see) or perhaps a simplicity prior of some sort. But maybe it will work out. I don’t know.
My lack of optimism as to the possibility of your agenda is basically why I was willing to accept the strange probability distribution TurnTrout went with, I guess. But on reflection, perhaps I should have used that as an existence proof of distribution over reward which allows something like our intuitive picture of power seeking. And tried to see if I could interpret it to be something less weird, use it to find something less weird, or just go look for less weird things because maybe they’d work.
Sorry for the long post, but I just realized I didn’t update based off Turntrout’s results. It seems more likely to me now that your agenda might work. Though I’d be more optimistic if you were using Turntrout’s distribution as inspiration for what to look for in some way.