Leon Lang comments on Leon Lang’s Shortform

Leon Lang 21 Jan 2023 0:18 UTC
1 point
0
Edit: This is now obsolete with our NAH distillation.
Making the Telephone Theorem and Its Proof Precise
This short form distills the Telephone Theorem and its proof. The short form will thereby not at all be “intuitive”; the only goal is to be mathematically precise at every step.
Let $M_{0}, M_{1}, \dots$ be jointly distributed finite random variables, meaning they are all functions
$M_{i} : Ω \to M_{i}$
starting from the same finite sample space with a given probability distribution $P$ and into respective finite value spaces $M_{i}$ . Additionally, assume that these random variables form a Markov chain $M_{0} \to M_{1} \to \dots$ .
Lemma: For a Markov chain $M_{0} \to M_{1} \to M_{2}$ , the following two statements are equivalent:
(a) $I (M_{1}; M_{0}) = I (M_{2}; M_{0})$
(b) For all $m_{1}, m_{2}$ with $P (m_{1}, m_{2}) > 0$ : $P (M_{0} ∣ m_{1}) = P (M_{0} ∣ m_{2})$ .
Proof:
Assume (a): Inspecting an information diagram of $M_{0}, M_{1}, M_{2}$ will immediately result in us also observing the Markov chain $M_{0} \to M_{2} \to M_{1}$ . Markov chains can be turned around, thus we get the two chains
$M_{1} \to M_{2} \to M_{0}, M_{2} \to M_{1} \to M_{0} .$
Factorizing along these two chains, we obtain:
$P (M_{0} ∣ M_{2}) \cdot P (M_{1}, M_{2}) = P (M_{0}, M_{1}, M_{2}) = P (M_{0} ∣ M_{1}) \cdot P (M_{1}, M_{2})$
and thus, for all $m_{1}, m_{2}$ with $P (m_{1}, m_{2}) > 0 :$ $P (M_{0} ∣ m_{1}) = P (M_{0} ∣ m_{2})$ . That proves (b).
Assume (b): We have
$P (M_{0}, M_{1} ∣ M_{2}) = P (M_{0} ∣ M_{1}, M_{2}) \cdot P (M_{1} ∣ M_{2})$ $= P (M_{0} ∣ M_{1}) \cdot P (M_{1} ∣ M_{2}) = P (M_{0} ∣ M_{2}) \cdot P (M_{1} ∣ M_{2}),$
where, in the second step, we used the Markov chain $M_{0} \to M_{1} \to M_{2}$ and in the third step, we used assumption (b). This independence gives us the vanishing of conditional mutual information:
$I (M_{0}; M_{1} ∣ M_{2}) = 0.$
Together with the Markov chain $M_{0} \to M_{1} \to M_{2}$ , this results, by inspecting an information diagram, in the equality $I (M_{1}; M_{0}) = I (M_{2}; M_{0})$ . $□$
Theorem: Let $n \geq 1$ . The following are equivalent:
(a) $I (M_{n}; M_{0}) = I (M_{n + 1}; M_{0})$
(b) There are functions $f_{n}, f_{n + 1}$ defined on $M_{n}, M_{n + 1}$ , respectively such that:
- $f_{n} (M_{n}) = f_{n + 1} (M_{n + 1})$ with probability $1$ , i.e., the measure of all $ω \in Ω$ such that the equality doesn’t hold is zero.
- For all $m_{n} \in M_{n}$ , we have the equality $P (M_{0} ∣ M_{n} = m_{n}) = P (M_{0} ∣ f_{n} (M_{n}) = f_{n} (m_{n}))$ , and the same for $n + 1$ .
Proof: The Markov chain immediately also gives us a Markov chain $M_{0} \to M_{n} \to M_{n + 1}$ , meaning we can without loss of generality assume that $n = 1$ . So let’s consider the simple Markov chain $M_{0} \to M_{1} \to M_{2}$ .
Assume (a): By the lemma, this gives us for all $m_{1}, m_{2}$ with $P (m_{1}, m_{2}) > 0$ : $P (M_{0} ∣ m_{1}) = P (M_{0} ∣ m_{2})$ .
Define the two functions $f_{i} : M_{i} \to Δ (M_{0})$ , $i = 1, 2$ by:
$f_{i} (m_{i}) := P (M_{0} ∣ m_{i}) .$
Then we have $f_{1} (m_{1}) = f_{2} (m_{2})$ with probability 1^[1], giving us the first condition we wanted to prove.
For the second condition, we use a trick from Probability as Minimal Map: set $p := f_{1} (m_{1})$ , which is a probability distribution. We get
$P (M_{0} ∣ f_{1} (M_{1}) = f_{1} (m_{1})) = P (M_{0} ∣ f_{1} (M_{1}) = p) = \frac{P (M_{0}, f_{1} (M_{1}) = p)}{P (f_{1} (M_{1}) = p)} = \frac{\sum_{m_{1}^{'} : f_{1} (m_{1}^{'}) = p} P (M_{0}, m_{1}^{'})}{\sum_{m_{1}^{'} : f_{1} (m_{1}^{'}) = p} P (m_{1}^{'})}$ $= \frac{\sum_{m_{1}^{'} : f_{1} (m_{1}^{'}) = p} P (M_{0} ∣ m_{1}^{'}) \cdot P (m_{1}^{'})}{\sum_{m_{1}^{'} : f_{1} (m_{1}^{'}) = p} P (m_{1}^{'})} = \frac{\sum_{m_{1}^{'} : f_{1} (m_{1}^{'}) = p} p \cdot P (m_{1}^{'})}{\sum_{m_{1}^{'} : f_{1} (m_{1}^{'}) = p} P (m_{1}^{'})} = p = f_{1} (m_{1}) = P (M_{0} ∣ M_{1} = m_{1}) .$
That proves (b).
Assume (b): For the other direction, let $m_{1}, m_{2}$ be given with $P (m_{1}, m_{2}) > 0.$ Let $ω \in Ω$ be such that $(M_{1} (ω), M_{2} (ω)) = (m_{1}, m_{2})$ and with $P (ω) > 0$ . We have
$f_{1} (m_{1}) = [f_{1} (M_{1})] (ω) = [f_{2} (M_{2})] (ω) = f_{2} (m_{2})$
and thus
$P (M_{0} ∣ m_{1}) = P (M_{0} ∣ f_{1} (M_{1}) = f_{1} (m_{1})) = P (M_{0} ∣ f_{2} (M_{2}) = f_{2} (m_{2})) = P (M_{0} ∣ m_{2}) .$
The result follows from the Lemma.^[2]
1. ^
  Somehow, my brain didn’t find this obvious. Here is an explanation:
  $P ({ω ∣ f_{1} (M_{1} (ω)) \neq f_{2} (M_{2} (ω))}) \leq P ({ω ∣ P (M_{1} (ω), M_{2} (ω)) = 0})$
  $= P ((M_{1}, M_{2})^{- 1} ({(m_{1}, m_{2}) ∣ P (m_{1}, m_{2}) = 0})) = \sum_{m_{1}, m_{2} : P (m_{1}, m_{2}) = 0} P ((M_{1}, M_{2})^{- 1} (m_{1}, m_{2}))$
  $= \sum_{m_{1}, m_{2} : P (m_{1}, m_{2}) = 0} P (m_{1}, m_{2}) = 0$
2. ^
  There is some subtlety about whether the random variable $f_{1} (M_{1})$ can be replaced by $f_{2} (M_{2})$ in that equation. But given that they are “almost” the same random variables, I think this is valid inside the probability equation.

Leon Lang comments on Leon Lang’s Shortform

Making the Telephone Theorem and Its Proof Precise