$500 Bounty/Prize Problem: Channel Capacity Using “Insensitive” Functions

johnswentworth16 May 2023 21:31 UTC

LW: 40 AF: 18

Logic & Mathematics Natural Abstraction AI Bounties & Prizes (active)

Informal Problem Statement

We have an information channel between Alice and Bob. Alice picks a function. Bob gets to see the value of that function at some randomly chosen input values… but doesn’t know exactly which randomly chosen input values. He does get to see the randomly chosen values of some of the input variables, but not all of them.

The problem is to find which functions Alice should pick with what frequencies, in order to maximize the channel capacity.

Why Am I Interested In This?

I’m interested in characterizing functions which are “insensitive” to subsets of their input variables, especially in high-dimensional spaces. For instance, xor of a bunch of random bits is maximally sensitive: if we have a ⁵⁰⁄₅₀ distribution over any one of the bits but know all the others, then all information about the output is wiped out. On the other end of the spectrum, a majority function of a bunch of random bits is highly insensitive: if we have a ⁵⁰⁄₅₀ distribution over, say, 10% of the bits, but know all the others, then in most cases we can correctly guess the function’s output.

I have an argument here that the vast majority of functions $f : {0, 1}^{n} \to {0, 1}$ are pretty highly sensitive: as the number of unknown inputs increases, information falls off exponentially quickly. On the other hand, the example of majority functions shows that this is not the case for all functions.

Intuitively, in the problem, Alice needs to mostly pick from “insensitive” functions, since Bob mostly can’t distinguish between “sensitive” functions.

… And Why Am I Interested In That?

I expect that natural abstractions have to be insensitive features of the world. After all, different agents don’t all have exactly the same input data. So, a feature has to be fairly insensitive in order for different agents to agree on its value.

In fact, we could view the problem statement itself as a very rough way of formulating the coordination problem of language: Alice has to pick some function f which takes in an image and returns ⁰⁄₁ representing whether the image contains an apple. (The choice of function defines what “apple” means, for our purposes.) Then Alice wants to teach baby Bob what “apple” means. So, there’s some random stuff around them, and Alice points at the random stuff and says “apple” for some of it, and says something besides “apple” the rest of the time. Baby Bob is effectively observing the value of the function at some randomly-chosen points, and needs to back out which function Alice intended. And Bob doesn’t have perfect access to all the bits Alice is seeing, so the function has to be robust.

Formal Problem Statement

Consider the following information channel between Alice and Bob:

Alice picks a function $f : {0, 1}^{n} \to {0, 1}$
Nature generates m possible inputs $x^{1}, . . ., x^{m}$ , each sampled uniformly and independently from ${0, 1}^{n}$ .
Nature also generates $m$ subsets $S_{1}, . . ., S_{m}$ of $1, . . ., n$ , each sampled uniformly and independently from subsets of size $s$ .
Bob observes $Y = (Y_{1}, . . ., Y_{m})$ where $Y_{i} = (f (x_{i}), x_{S_{i}}, S_{i})$ .

The problem is to compute the distribution over $f$ which achieves the channel capacity, i.e.

${argmax}_{P [f]} \sum_{f, Y} P [f] P [Y | f] ln \frac{P [Y | f]}{\sum_{f^{'}} P [Y | f^{'}] P [f^{'}]}$

Bounty/Prize Info

The problem is to characterize the channel throughput maximizing distribution $P [f]$ . The characterization should make clear the answers to questions like:

What functions have the highest probability?
How quickly does the probability fall off as we move “away” from the most probable functions, and what do marginally-less-probable functions look like?
How much probability is assigned to a typical function chosen uniformly at random?
Which functions, if any, are assigned zero probability?

All of these should have human-interpretable answers. No credit will be given for e.g. existence and uniqueness alone (the optimization is convex in the happy direction, so that’s pretty easy anyway), or a program which would compute $P [f]$ given superexponential compute.

I may give partial or even full awards for variations on the above problem, depending on my own judgement of how useful they are. For instance, any of the following seem reasonable and potentially valuable:

Different domain/range for f.
(Nontrivial) big-O size restrictions on S
Asymptotic results in general

Deadline is end of June. If there are multiple qualifying answers, then I will award prize money based on how useful I judge each answer to be.

What links here?

Thane Ruthenis's comment on Current AIs Provide Nearly No Data Relevant to AGI Alignment by Thane Ruthenis (17 Dec 2023 16:03 UTC; 4 points)