Actually, on further thought, I think the best thing to use here is a log-bilinear distribution over the space of truth-assignments. For these, it is easy to efficiently compute exact normalizing constants, conditional distributions, marginal distributions, and KL divergences; there is no impedance mismatch. KL divergence minimization here is still a convex minimization (in the natural parametrization of the exponential family).
The only shortcoming is that 0 is not a probability, so it won’t let you eg say that Pr(φ1→φ2)=1; but this can be remedied using a real or hyperreal approximation.
Actually, on further thought, I think the best thing to use here is a log-bilinear distribution over the space of truth-assignments. For these, it is easy to efficiently compute exact normalizing constants, conditional distributions, marginal distributions, and KL divergences; there is no impedance mismatch. KL divergence minimization here is still a convex minimization (in the natural parametrization of the exponential family).
The only shortcoming is that 0 is not a probability, so it won’t let you eg say that Pr(φ1→φ2)=1; but this can be remedied using a real or hyperreal approximation.