A distribution over distributions just becomes a distribution. Just use P(x) = integral_{p} P(x|q)P(q)dq. The distance I’m proposing is max_x abs log(P1(x) / P2(x)) = max_x abs (log(integral_{p} P1(x|q) P1(q) dq) - integral_p P2(x|q) P2(q) dq)).
I think it might be possible to make this better. If Alice and Bob both agree that x is unlikely, then both disagreeing about the probability seems like less of a problem. For example, if Alice thinks it’s one-in-a-million, and Bob think it’s one-in-a-billion, then Alice would need a thousand-to-one evidence ratio to believe what Bob believes which means that that piece of evidence has a one-in-a-thousand chance of occurring, but since it only has a one-in-a-million chance of being needed, that doesn’t matter much. It seems like it would only make a one-in-a-thousand difference. If you do it this way, it would need to be additive, but the distance is still at most the metric I just gave.
A distribution over distributions just becomes a distribution. Just use P(x) = integral_{p} P(x|q)P(q)dq. The distance I’m proposing is max_x abs log(P1(x) / P2(x)) = max_x abs (log(integral_{p} P1(x|q) P1(q) dq) - integral_p P2(x|q) P2(q) dq)).
I think it might be possible to make this better. If Alice and Bob both agree that x is unlikely, then both disagreeing about the probability seems like less of a problem. For example, if Alice thinks it’s one-in-a-million, and Bob think it’s one-in-a-billion, then Alice would need a thousand-to-one evidence ratio to believe what Bob believes which means that that piece of evidence has a one-in-a-thousand chance of occurring, but since it only has a one-in-a-million chance of being needed, that doesn’t matter much. It seems like it would only make a one-in-a-thousand difference. If you do it this way, it would need to be additive, but the distance is still at most the metric I just gave.
The metric for this would be:
integral_x log(max(P1(x), P2(x)) max(P1(x) / P2(x), P2(x) / P1(x)))
= integral_x log(max(P1^2(x) / P2(x), P2^2(x) / P1(x)))