I agree with all your intuition here. The thing about the partial functions is unsatisfactory, because it is discontinuous.
It is trying to be #1, but a little more ambitious. I want the distribution on distributions to be a new type of epistemic state, and the geometric maximization to be the mechanism for converting the new epistemic state to a traditional probability distribution. I think that any decent notion of an embedded epistemic state needs to be closed under both mixing and coarsening, and this is trying to satisfy that as naturally as possible.
I think that the 0s are pretty bad, but I think they are the edge case of the only reasonable thing to do here. I think the reason it feels like the only reasonable thing to do for me is something like credit assignment/hypothesis autonomy. If a world gets probability mass, that should be because some hypothesis or collection of hypotheses insisted on putting probability mass there. You gave an edge case example where this didn’t happen. Maybe everything is edge cases. I am not sure.
It might be that the 0s are not as bad as they seem. 0s seem bad because we have cached that “0 means you cant update” but maybe you aren’t supposed to be updating in the output distribution anyway, you are supposed to do you updating in the more general epistemic state input object.
I actually prefer a different proposal for the type of “epistemic state that is closed under coarsening and mixture” that is more general than the thing I gesture at in the post:
A generalized epistemic state is a (quasi-?)convex function ΔW→R. A standard probability distribution is converted to an epistemic state through P↦(Q↦DKL(P||Q)). A generalized epistemic state is converted to a (convex set of) probability distribution(s) by taking an argmin. Mixture is mixture as functions, and coarsening is the obvious thing (given a function W→V, we can convert a generalized epistemic state over V to a generalized epistemic state over W by precomposing with the obvious function from ΔW to ΔV.)
The above proposal comes together into the formula we have been talking about, but you can also imagine having generalized epistemic states that didn’t come from mixtures of coarse distributions.
I agree with all your intuition here. The thing about the partial functions is unsatisfactory, because it is discontinuous.
It is trying to be #1, but a little more ambitious. I want the distribution on distributions to be a new type of epistemic state, and the geometric maximization to be the mechanism for converting the new epistemic state to a traditional probability distribution. I think that any decent notion of an embedded epistemic state needs to be closed under both mixing and coarsening, and this is trying to satisfy that as naturally as possible.
I think that the 0s are pretty bad, but I think they are the edge case of the only reasonable thing to do here. I think the reason it feels like the only reasonable thing to do for me is something like credit assignment/hypothesis autonomy. If a world gets probability mass, that should be because some hypothesis or collection of hypotheses insisted on putting probability mass there. You gave an edge case example where this didn’t happen. Maybe everything is edge cases. I am not sure.
It might be that the 0s are not as bad as they seem. 0s seem bad because we have cached that “0 means you cant update” but maybe you aren’t supposed to be updating in the output distribution anyway, you are supposed to do you updating in the more general epistemic state input object.
I actually prefer a different proposal for the type of “epistemic state that is closed under coarsening and mixture” that is more general than the thing I gesture at in the post:
A generalized epistemic state is a (quasi-?)convex function ΔW→R. A standard probability distribution is converted to an epistemic state through P↦(Q↦DKL(P||Q)). A generalized epistemic state is converted to a (convex set of) probability distribution(s) by taking an argmin. Mixture is mixture as functions, and coarsening is the obvious thing (given a function W→V, we can convert a generalized epistemic state over V to a generalized epistemic state over W by precomposing with the obvious function from ΔW to ΔV.)
The above proposal comes together into the formula we have been talking about, but you can also imagine having generalized epistemic states that didn’t come from mixtures of coarse distributions.