You aren’t the only one, z is usually used for the latent, I just followed Paul’s notation to avoid confusion.
P(A|Q) comes just from training on the QA pairs. But I did say “set any reasonable prior over answers”, because I expect P(z|Q,A) to be orders of magnitude higher for the right answer. Like I said in another comment, an image generator (that isn’t terrible) is incredibly unlikely to generate a cat from the input “dog”, so even big changes to the prior probably won’t matter much. That being said, machine learning rests on the IID assumption, regular reporters are no exception, they also incorporate P(A|Q), it’s just that here it is explicit.
The whole point of VAEs is that the estimation of the probability of a sample is efficient (see section 2.1 here: https://arxiv.org/abs/1606.05908v1), so I don’t expect it to be a problem.
You aren’t the only one, z is usually used for the latent, I just followed Paul’s notation to avoid confusion.
P(A|Q) comes just from training on the QA pairs. But I did say “set any reasonable prior over answers”, because I expect P(z|Q,A) to be orders of magnitude higher for the right answer. Like I said in another comment, an image generator (that isn’t terrible) is incredibly unlikely to generate a cat from the input “dog”, so even big changes to the prior probably won’t matter much. That being said, machine learning rests on the IID assumption, regular reporters are no exception, they also incorporate P(A|Q), it’s just that here it is explicit.
The whole point of VAEs is that the estimation of the probability of a sample is efficient (see section 2.1 here: https://arxiv.org/abs/1606.05908v1), so I don’t expect it to be a problem.