Thanks! I spent a bit of time understanding the stochastic inverse paper, though haven’t yet fully grokked it. My understanding here is that you’re trying to learn the conditional probabilities in a Bayes net from samples. The “non-amortized” way to do this for them is to choose a (non-unique) maximal inverse factorization that satisfies some d-separation condition, then guess the conditional probabilities on the latent-generating process by just observing frequencies of conditional events—but of course this is very inefficient, in particular because the inverse factorization isn’t a general Bayes net, but must satisfy a bunch of consistency conditions; and then you can learn a generative model for these consistency conditions by a NN and then perform some MCMC sampling on this learned prior.
So is the “moral” you want to take away here then that by exploring a diversity of tasks (corresponding to learning this generative prior on inverse Bayes nets) a NN can significantly improve its performance on single-shot prediction tasks?
Thanks! I spent a bit of time understanding the stochastic inverse paper, though haven’t yet fully grokked it. My understanding here is that you’re trying to learn the conditional probabilities in a Bayes net from samples. The “non-amortized” way to do this for them is to choose a (non-unique) maximal inverse factorization that satisfies some d-separation condition, then guess the conditional probabilities on the latent-generating process by just observing frequencies of conditional events—but of course this is very inefficient, in particular because the inverse factorization isn’t a general Bayes net, but must satisfy a bunch of consistency conditions; and then you can learn a generative model for these consistency conditions by a NN and then perform some MCMC sampling on this learned prior.
So is the “moral” you want to take away here then that by exploring a diversity of tasks (corresponding to learning this generative prior on inverse Bayes nets) a NN can significantly improve its performance on single-shot prediction tasks?