The maximum of the absolute value of the log of the ratio between the probability of a given hypothesis on each prior. That is the log of the highest possible odds of a piece of evidence that brings you from one prior to the other.
I’m unclear on your terminology. I take a prior to be a distribution over distributions; in practice, usually a distribution over the parameters of a parameterised family. Let P1 and P2 be two priors of this sort, distributions over some parameter space Q. Write P1(q) for the probability density at q, and P1(x|q) for the probability density at x for parameter q. x varies over the data space X.
Is the distance measure you are proposing max_{q in Q} abs log( P1(q) / P2(q) )?
Or is it max_{q in Q,x in X} abs log( P1(x|q) / P2(x|q) )?
Or max_{q in Q,x in X} abs log( (P1(q)P1(x|q)) / (P2(q)P2(x|q)) )?
A distribution over distributions just becomes a distribution. Just use P(x) = integral_{p} P(x|q)P(q)dq. The distance I’m proposing is max_x abs log(P1(x) / P2(x)) = max_x abs (log(integral_{p} P1(x|q) P1(q) dq) - integral_p P2(x|q) P2(q) dq)).
I think it might be possible to make this better. If Alice and Bob both agree that x is unlikely, then both disagreeing about the probability seems like less of a problem. For example, if Alice thinks it’s one-in-a-million, and Bob think it’s one-in-a-billion, then Alice would need a thousand-to-one evidence ratio to believe what Bob believes which means that that piece of evidence has a one-in-a-thousand chance of occurring, but since it only has a one-in-a-million chance of being needed, that doesn’t matter much. It seems like it would only make a one-in-a-thousand difference. If you do it this way, it would need to be additive, but the distance is still at most the metric I just gave.
The maximum of the absolute value of the log of the ratio between the probability of a given hypothesis on each prior. That is the log of the highest possible odds of a piece of evidence that brings you from one prior to the other.
I’m unclear on your terminology. I take a prior to be a distribution over distributions; in practice, usually a distribution over the parameters of a parameterised family. Let P1 and P2 be two priors of this sort, distributions over some parameter space Q. Write P1(q) for the probability density at q, and P1(x|q) for the probability density at x for parameter q. x varies over the data space X.
Is the distance measure you are proposing max_{q in Q} abs log( P1(q) / P2(q) )?
Or is it max_{q in Q,x in X} abs log( P1(x|q) / P2(x|q) )?
Or max_{q in Q,x in X} abs log( (P1(q)P1(x|q)) / (P2(q)P2(x|q)) )?
Or something else?
A distribution over distributions just becomes a distribution. Just use P(x) = integral_{p} P(x|q)P(q)dq. The distance I’m proposing is max_x abs log(P1(x) / P2(x)) = max_x abs (log(integral_{p} P1(x|q) P1(q) dq) - integral_p P2(x|q) P2(q) dq)).
I think it might be possible to make this better. If Alice and Bob both agree that x is unlikely, then both disagreeing about the probability seems like less of a problem. For example, if Alice thinks it’s one-in-a-million, and Bob think it’s one-in-a-billion, then Alice would need a thousand-to-one evidence ratio to believe what Bob believes which means that that piece of evidence has a one-in-a-thousand chance of occurring, but since it only has a one-in-a-million chance of being needed, that doesn’t matter much. It seems like it would only make a one-in-a-thousand difference. If you do it this way, it would need to be additive, but the distance is still at most the metric I just gave.
The metric for this would be:
integral_x log(max(P1(x), P2(x)) max(P1(x) / P2(x), P2(x) / P1(x)))
= integral_x log(max(P1^2(x) / P2(x), P2^2(x) / P1(x)))