Finding “Z-best” is not the same as finding the posterior over Z, and in fact differs systematically. In particular, because you’re not being a real Bayesian, you’re not getting the advantage of the Bayesian Occam’s Razor, so you’ll systematically tend to get lower-entropy-than-optimal (aka more-complex-than-optimal, overfitted) Zs. Adding an entropy-based loss term might help — but then, I’d expect that H already includes entropy-based loss, so this risks double-counting.
The above critique is specific and nitpicky. Separately from that, this whole schema feels intuitively wrong to me. I think there must be ways to do the math in a way that favors a low-entropy, causally-realistic, quasi-symbolic likelihood-like function that can be combined with a predictive, uninterpretably-neural learned Z to give a posterior that is better at intuitive leaps than the former but beter at generalizing than the latter. All of this would be intrinsic, and human alignment would be a separate problem. Intuitively it seems to me that trying to do human alignment and generalizability using the same trick is the wrong approach.
Finding “Z-best” is not the same as finding the posterior over Z, and in fact differs systematically. In particular, because you’re not being a real Bayesian, you’re not getting the advantage of the Bayesian Occam’s Razor, so you’ll systematically tend to get lower-entropy-than-optimal (aka more-complex-than-optimal, overfitted) Zs. Adding an entropy-based loss term might help — but then, I’d expect that H already includes entropy-based loss, so this risks double-counting.
The above critique is specific and nitpicky. Separately from that, this whole schema feels intuitively wrong to me. I think there must be ways to do the math in a way that favors a low-entropy, causally-realistic, quasi-symbolic likelihood-like function that can be combined with a predictive, uninterpretably-neural learned Z to give a posterior that is better at intuitive leaps than the former but beter at generalizing than the latter. All of this would be intrinsic, and human alignment would be a separate problem. Intuitively it seems to me that trying to do human alignment and generalizability using the same trick is the wrong approach.