Alexander Gietelink Oldenziel comments on Alexander Gietelink Oldenziel’s Shortform

Alexander Gietelink Oldenziel 22 Nov 2022 14:42 UTC
1 point
0
Imagine a data stream
$. . . X_{- 3}, X_{- 2}, X_{- 1}, X_{0}, X_{1}, X_{2}, X_{3} . . .$
assumed infinite in both directions for simplicity. Here $X_{0}$ represents the current state ( the “present”) and while $. . . X_{- 3}, X_{- 2}, X_{- 1}$ and $X_{1}, X_{2}, X_{3}, . . .$ represents the future
Predictible Information versus Predictive Information
Predictible information is the maximal information (in bits) that you can derive about the future given the access to the past. Predictive information is the amount of bits that you need from the past to make that optimal prediction.
Suppose you are faced with the question of whether to buy, hold or sell Apple. There are three options so maximally $l o g_{2} (3)$ bits of information—not all of that information might be in contained in the past, there a certain part of irreductible uncertainty (entropy) about the future no matter how well you can infer the past. Think about a freak event & blacks swans like pandemics, wars, unforeseen technological breakthroughs, just cumulative aggregated noise in consumer preference etc. Suppose that irreducible uncertainty is half of $l o g_{2} (3)$ leaving us with $\frac{1}{2} l o g_{2} (3)$ of (theoretically) predictible information.
To a certain degree, it might be predictible in theory to what degree buying Apple stock is a good idea. To do so, you may need to know many things about the past: Apple’s earning records, position of competitiors, general trends of the economy, understanding of the underlying technology & supply chains etc. The total sum of this information is far larger than $\frac{1}{2} l o g_{2} (3)$
To actually do well on the stock market you additionally need to do this better than the competititon—a difficult task! The predictible information is quite small compared to the predictive information
Note that predictive information is always greater than predictible information: you need to at least $k$ bits from the past to predict $k$ bits of the future. Often it is much larger.
Mathematical details
Predictible information is also called ‘apparent stored information’ or commonly ‘excess entropy’.
It is defined as the mutual information $I (X_{\leq 0}, X_{\geq 0})$ between the future and the past.
The predictive information is more difficult to define. It is also called the ‘statistical complexity’ or ‘forecasting complexity’ and is defined as the entropy of the steady equilibrium state of the ‘epsilon machine’ of the process.
What is the Epsilon Machine of the process ${X_{i}}_{i \in Z}$ ? Define the causal states as the process as the partition on the sets of possible pasts $. . ., x_{- 3}, x_{- 2}, x_{- 1}$ where two pasts $\to x, {\to x}^{'}$ are in the same part / equivalence class when the future conditioned on $\to x, {\to x}^{'}$ respectively is the same.
That is $P (X_{> 0} | \to x) = P (X_{> 0}, {\to x}^{'})$ . Without going into too much more detail the forecasting complexity measures the size of this creature.