“Yeah, so my dumb argumentive comment is, prediction does not equal compression. Sequential prediction equals compression. But non-sequential prediction is also important and does not equal compression… And by non-sequential prediction, I mean you have a sequence of bits in the information theory model, but if instead you have this broad array of things that you could think about, and you’re not sure when any one of them will become observed, [then] you want all of your beliefs to be good, but you don’t have any next bit… You can’t express the goodness just in terms of this sequential goodness.”
I think this argument is conflating “data which is temporally linear” and “algorithm which is applied sequentially”. “Prediction” isn’t an element of past data VS future data, it’s an element of information the algorithm has seen so far VS information the algorithm hasn’t yet processed.
I think this argument is conflating “data which is temporally linear” and “algorithm which is applied sequentially”. “Prediction” isn’t an element of past data VS future data, it’s an element of information the algorithm has seen so far VS information the algorithm hasn’t yet processed.