Observation: If we want to keep everything in product form, then adding a constraint to the argmax can be seen as multiplying by an indicator function. I.e. if I[Φ] is 1 when Φ is true and 0 when Φ is false, then argmaxP∈ΔW,P(X)≥bGw∼P0[P(w)]=argmaxP∈ΔWI[P(X)≥b]Gw∼P0[P(w)]. Notably, we can’t really do this with arithmetic maximization because then we would be taking the logarithm of 0, which is undefined.
I’m not sure how useful this is, because it doesn’t really help with empirical approximation, as then you run into problems with multiplying by zero. But this might be nice; at least it seems to provide a quantitative justification for the view of probability distributions as being “soft constraints”.
Observation: If we want to keep everything in product form, then adding a constraint to the argmax can be seen as multiplying by an indicator function. I.e. if I[Φ] is 1 when Φ is true and 0 when Φ is false, then argmaxP∈ΔW,P(X)≥bGw∼P0[P(w)]=argmaxP∈ΔWI[P(X)≥b]Gw∼P0[P(w)]. Notably, we can’t really do this with arithmetic maximization because then we would be taking the logarithm of 0, which is undefined.
I’m not sure how useful this is, because it doesn’t really help with empirical approximation, as then you run into problems with multiplying by zero. But this might be nice; at least it seems to provide a quantitative justification for the view of probability distributions as being “soft constraints”.