An easy way to get rid of the probabilities-outside-[0,1] problem in the continuous relaxation is to constrain the “conditional”/updated distribution to have Var(1φi∣∣…)≤E(1φi∣∣…)(1−E(1φi∣∣…)) (which is a convex constraint; it’s equivalent to Var(1φi∣∣…)+(E(1φi∣∣…)−12)2), and then minimize KL-divergence accordingly.
The two obvious flaws are that the result of updating becomes ordering-dependent (though this may not be a problem in practice), and that the updated distribution will sometimes have Var(1φi∣∣…)<E(1φi∣∣…)(1−E(1φi∣∣…)), and it’s not clear how to interpret that.
An easy way to get rid of the probabilities-outside-[0,1] problem in the continuous relaxation is to constrain the “conditional”/updated distribution to have Var(1φi∣∣…)≤E(1φi∣∣…)(1−E(1φi∣∣…)) (which is a convex constraint; it’s equivalent to Var(1φi∣∣…)+(E(1φi∣∣…)−12)2), and then minimize KL-divergence accordingly.
The two obvious flaws are that the result of updating becomes ordering-dependent (though this may not be a problem in practice), and that the updated distribution will sometimes have Var(1φi∣∣…)<E(1φi∣∣…)(1−E(1φi∣∣…)), and it’s not clear how to interpret that.