I was just trying to clarify the limits of autoregressive vs other learning methods. Autoregressive learning is at an apparent disadvantage if P(Xt|Xt−1) is hard to compute and the reverse is easy and low entropy. It can “make up for this” somewhat if it can do a good job of predicting Xt from Xt−2, but it’s still at a disadvantage if, for example, that’s relatively high entropy compared to Xt−1 from Xt. That’s it, I’m satisfied.
I was just trying to clarify the limits of autoregressive vs other learning methods. Autoregressive learning is at an apparent disadvantage if P(Xt|Xt−1) is hard to compute and the reverse is easy and low entropy. It can “make up for this” somewhat if it can do a good job of predicting Xt from Xt−2, but it’s still at a disadvantage if, for example, that’s relatively high entropy compared to Xt−1 from Xt. That’s it, I’m satisfied.