The question is whether this expression is easy to compute or not, and fortunately the answer is that it’s quite easy! We can evaluate the first term by the simple Monte Carlo method of drawing many independent samples z∼Q(z∣x) and evaluating the empirical average, as we know the distribution Q(z∣x) explicitly and it was presumably chosen to be easy to draw samples from.
My question when reading this was: why can’t we say the same thing about P(x)=Ez∼P(z)[P(x|z)]? i.e. draw many independent samples and evaluate the empirical average? Usually P(z) is also assumed known and simple to sample from (e.g., gaussian).
So far, my answer is:
Q(z∣x)∼P(z∣x)∝P(x∣z)P(z), so assuming x is my data, usually Q(z∣x) will be high when P(x∣z) is high, so the samples during MCMC will be big enough to contribute to the sum, unlike blindly sampling from P(z) where most samples will contribute nearly 0 to the sum.
Also another reason being how the expectation can be reduced to the sum of expectations over each of the dimensions of z and x if Q(z∣x) and P(x∣z) factorizes nicely.
My question when reading this was: why can’t we say the same thing about P(x)=Ez∼P(z)[P(x|z)]? i.e. draw many independent samples and evaluate the empirical average? Usually P(z) is also assumed known and simple to sample from (e.g., gaussian).
So far, my answer is:
Q(z∣x)∼P(z∣x)∝P(x∣z)P(z), so assuming x is my data, usually Q(z∣x) will be high when P(x∣z) is high, so the samples during MCMC will be big enough to contribute to the sum, unlike blindly sampling from P(z) where most samples will contribute nearly 0 to the sum.
Also another reason being how the expectation can be reduced to the sum of expectations over each of the dimensions of z and x if Q(z∣x) and P(x∣z) factorizes nicely.