Joe Benton comments on $500 Bounty/Prize Problem: Channel Capacity Using “Insensitive” Functions

Joe Benton 17 May 2023 12:52 UTC
LW: 17 AF: 9
4
AF
Here’s an argument that I think works in the limit where $m \to \infty$ . For very large $m$ , each subset $S \subseteq [n]$ of size $s$ will occur many times as an $S_{i}$ . Moreover, for each set of coodinate values $x_{S} := (x_{j})_{j \in S}$ , there will be many $x^{i}$ such that $S_{i} = S$ and $x_{S}^{i} = x_{S}$ . Therefore, using the law of large numbers for any subset $S \subseteq [n]$ of size $s$ and any set of coordinate values $x_{S}$ , Bob can infer $N (x_{S}, f) := # {x^{'} \in {0, 1}^{n} : x_{S}^{'} = x_{S} and f (x^{'}) = 1}$ , i.e. the number of inputs $x^{'}$ that agree with $x_{S}$ at the coordinates indexed by $S$ for which $f (x^{'}) = 1$ . Additionally, the random variable $Y$ contains no other information about $f$ that is not contained in this set of statistics.
Given a function $f : {0, 1}^{n} \to {0, 1}$ , let $F (f)$ denote the set of all functions $f^{'}$ such that for each subset $S \subseteq [n]$ of size $s$ and each set of coordinate values $x_{S}$ , $N (x_{S}, f) = N (x_{S}, f^{'})$ . The argument of the previous paragraph then implies that upon observing $Y$ Bob can deduce the equivalence class $F (f)$ , but no other information about $f$ from $Y$ .
If we have a random variable $X$ and we define another random variable $Y$ as a deterministic function $Y (X)$ , then it is easy to show that the set of distributions of $X$ which maximize the mutual information between $Y$ and $X$ are precisely those for which $Y$ has a uniform distribution. Applying this to the above setup, we see that the distributions over $f$ which maximize the channel capacity are those for which $F (f)$ is uniformly distributed over all the values that it can take.
If we suppose in addition that $f$ has the maximum entropy distribution subject to this constraint, we find that:
$P [f] \propto 1 / (# {f^{'} : N (x_{S}, f) = N (x_{S}, f^{'}) for all S \subseteq [n], x_{S} \in {0, 1}^{S}})$ .
Intuitively, this says that the probability of a given function $f$ is inversely proportional to the number of other functions $f^{'}$ that have the same number of ones on every hyperplane defined by fixing a set of some $s$ coordinates $x_{S}$ . This seems to correspond roughly to sensitivity: we intuitively expect there to be a lot of such functions $f^{'}$ when the number of ones that $f$ outputs on most hyperplanes is approximately half of the number of total points on that hyperplane, and saying that a function’s output is approximately half ones and half zeros on a given hyperplane is roughly saying that $f$ is sensitive to the remaining unspecified coordinates.
It’s not obvious to me that the above expression for $P [f]$ is feasible to compute in practice, but I think it is fairly naturally interpretable.
The key reason the problem is tractable in the case $m \to \infty$ is that the law of large numbers means the probabilities wash out and we get an essentially deterministic problem. In the case where $m$ is finite, this won’t happen and in particular I expect you’ll run into complications that arise from interaction terms where the method for combining the information from two observations $Y_{i}$ and $Y_{j}$ with $S_{i} \neq S_{j}$ is not clean.
I expect you’re more likely to end up with a tractable solution if you rephrase the problem statement slightly so that the subsets of inputs over which you’re agregating outputs when observed (in the case above, these subsets are the hyperplanes defined by the coordinates $x_{S}$ ) are disjoint or overlapping. (For example, taking all the $S_{i}$ to be random but equal would ensure this.) It strikes me that there might be a formulation like this that still captures the key features of the problem you’re interested in, but I’ve not thought about this in detail.
- johnswentworth 17 May 2023 17:16 UTC
  5 points
  1
  Parent
  This answer clears the bar for at least some prize money to be paid out, though the amount will depend on how far other answers go by the deadline.
  One thing which would make it stronger would be to provide a human-interpretable function for each equivalence class (so Alice can achieve the channel capacity by choosing among those functions).
  The suggestions for variants of the problem are good suggestions, and good solutions to those variants would probably also qualify for prize money.

Joe Benton comments on $500 Bounty/​Prize Problem: Channel Capacity Using “Insensitive” Functions

Joe Benton comments on $500 Bounty/Prize Problem: Channel Capacity Using “Insensitive” Functions