The first new qualitative thing in Information Theory when you move from two variables to three variables is the presence of negative values: information measures (entropy, conditional entropy, mutual information) are always nonnegative for two variables, but there can be negative triple mutual information I(X;Y;Z).
This so far is a relatively well-known fact. But what is the first new qualitative thing when moving from three to four variables? Non-Shannon-type Inequalities.
A fundamental result in Information Theory is that I(X;Y∣Z)≥0 always holds.
Given n random variables X1,…,Xn and α,β,γ⊆[n], from now on we write I(α;β∣γ) with the obvious interpretation of the variables standing for the joint variables they correspond to as indices.
Since I(α;β|γ)≥0 always holds, a nonnegative linear combination of a bunch of these is always a valid inequality, which we call a Shannon-type Inequality.
Then the question is, whether Shannon-type Inequalities capture all valid information inequalities of n variable. It turns out, yes for n=2, (approximately) yes for n=3, and no for n≥4.
Behold, the glorious Zhang-Yeung inequality, a Non-Shannon-type Inequality for n=4:
Given n random variables and α,β,γ⊆[n], it turns out that I(α;β∣γ)≥0 is equivalent to H(α∪β)+H(α∩β)≤H(α)+H(β) (submodularity), H(α)≤H(β) if α⊆β, and H(∅)=0.
This lets us write the inequality involving conditional mutual information in terms of joint entropy instead.
Let Γ∗n then be a subset of R2n, each element corresponding to the values of the joint entropy assigned to each subset of some random variables X1,…,Xn. For example, an element of Γ∗2 would be (H(∅),H(X1),H(X2),H(X1,X2))∈R2n for some random variables X1 and X2, with a different element being a different tuple induced by a different random variable (X′1,X′2).
Now let Γn represent elements of R2n satisfying the three aforementioned conditions on joint entropy. For example, Γ∗2’s element would be (h∅,h1,h2,h12)∈R2n satisfying e.g., h1≤h12 (monotonicity). This is also a convex cone, so its elements really do correspond to “nonnegative linear combinations” of Shannon-type inequalities.
Then, the claim that “nonnegative linear combinations of Shannon-type inequalities span all inequalities on the possible Shannon measures” would correspond to the claim that Γn=Γ∗n for all n.
The content of the papers linked above is to show that:
This implies that, while there exists a 23-tuple satisfying Shannon-type inequalities that can’t be constructed or realized by any random variables X1,X2,X3, there does exist a sequence of random variables (X(k)1,X(k)2,X(k)3)∞k=1 whose induced 23-tuple of joint entropies converge to that tuple in the limit.
Non-Shannon-type Inequalities
The first new qualitative thing in Information Theory when you move from two variables to three variables is the presence of negative values: information measures (entropy, conditional entropy, mutual information) are always nonnegative for two variables, but there can be negative triple mutual information I(X;Y;Z).
This so far is a relatively well-known fact. But what is the first new qualitative thing when moving from three to four variables? Non-Shannon-type Inequalities.
A fundamental result in Information Theory is that I(X;Y∣Z)≥0 always holds.
Given n random variables X1,…,Xn and α,β,γ⊆[n], from now on we write I(α;β∣γ) with the obvious interpretation of the variables standing for the joint variables they correspond to as indices.
Since I(α;β|γ)≥0 always holds, a nonnegative linear combination of a bunch of these is always a valid inequality, which we call a Shannon-type Inequality.
Then the question is, whether Shannon-type Inequalities capture all valid information inequalities of n variable. It turns out, yes for n=2, (approximately) yes for n=3, and no for n≥4.
Behold, the glorious Zhang-Yeung inequality, a Non-Shannon-type Inequality for n=4:
I(A;B)≤2I(A;B∣C)+I(A;C∣B)+I(B;C∣A)+I(A;B∣D)+I(C;D)Explanation of the math, for anyone curious.
Given n random variables and α,β,γ⊆[n], it turns out that I(α;β∣γ)≥0 is equivalent to H(α∪β)+H(α∩β)≤H(α)+H(β) (submodularity), H(α)≤H(β) if α⊆β, and H(∅)=0.
This lets us write the inequality involving conditional mutual information in terms of joint entropy instead.
Let Γ∗n then be a subset of R2n, each element corresponding to the values of the joint entropy assigned to each subset of some random variables X1,…,Xn. For example, an element of Γ∗2 would be (H(∅),H(X1),H(X2),H(X1,X2))∈R2n for some random variables X1 and X2, with a different element being a different tuple induced by a different random variable (X′1,X′2).
Now let Γn represent elements of R2n satisfying the three aforementioned conditions on joint entropy. For example, Γ∗2’s element would be (h∅,h1,h2,h12)∈R2n satisfying e.g., h1≤h12 (monotonicity). This is also a convex cone, so its elements really do correspond to “nonnegative linear combinations” of Shannon-type inequalities.
Then, the claim that “nonnegative linear combinations of Shannon-type inequalities span all inequalities on the possible Shannon measures” would correspond to the claim that Γn=Γ∗n for all n.
The content of the papers linked above is to show that:
Γ2=Γ∗2
Γ3≠Γ∗3 but Γ3=¯¯¯¯¯¯Γ∗3 (closure[1])
Γ4≠Γ∗4 and Γ4≠¯¯¯¯¯¯Γ∗4, and also for all n≥4.
This implies that, while there exists a 23-tuple satisfying Shannon-type inequalities that can’t be constructed or realized by any random variables X1,X2,X3, there does exist a sequence of random variables (X(k)1,X(k)2,X(k)3)∞k=1 whose induced 23-tuple of joint entropies converge to that tuple in the limit.
@Fernando Rosas