Actually, on further thought, I think the best thing to use here is a log-bilinear distribution over the space of truth-assignments. For these, it is easy to efficiently compute exact normalizing constants, conditional distributions, marginal distributions, and KL divergences; there is no impedance mismatch. KL divergence minimization here is still a convex minimization (in the natural parametrization of the exponential family).
The only shortcoming is that 0 is not a probability, so it won’t let you eg say that ; but this can be remedied using a real or hyperreal approximation.
Consider the function a(M1,M2)=2−d(M1,M2)−d(M2,M1) where d(M1,M2)=min(|x||x∈{0,1}∗:∀y∈{0,1}∗:M1(xy)=M2(y) unless neither of these halts). The reversible Markov chain with transition probabilities p(M1,M2)=a(M1,M2)∑M′2a(M1,M′2) has a bounded positive invariant measure μ(M)=∑M′a(M,M′). Of course, as the post showed, the total measure is infinite. Also, because the chain is reversible and transient, the invariant measure is far from unique—indeed, for any machine M0, the measure μ(M)=p(0)(M,M0)+2∑∞n=1p(n)(M,M0) will be a bounded positive invariant measure.
It seems tempting (to me) to try to get a probability measure by modding out the output-permutations (that the post uses to show this isn’t possible for the full set of UTMs). To this end, consider the set of UTMs that have no output. (These will be unaffected by the output-permutations.) We can try to use the induced sub-digraph on these to build a probability measure μ. The measure of each UTM should be a function of the rooted edge-labeled digraph GM rooted at that UTM.
The most natural topology on rooted edge-labeled infinite digraphs is the one generated by the sets {G:G′ is isomorphic to an induced rooted edge-labeled subgraph of G} where G′ ranges over finite rooted edge-labeled digraphs—we could hope that μ is continuous according to this topology. Unfortunately, this can’t work: if μ(M)>0 then μ−1((12μ(M),∞)) must be open, and so it must contain some finite intersection of the generating sets; however, every such intersection that’s nonempty (as this one is) contains infinitely many UTMs, so the total measure must be infinite as well.