Ultimately, though, we are not wedded to our particular formulation. Perhaps there is some clever sampling-based verifier that “trivializes” our conjecture as well, in which case we would want to revise it.
I think your goal should be to show that your abstract conjecture implies the concrete result you’re after, or is even equivalent to it.
At ARC, we are interested in finding explanations of neural network behavior. Concretely, a trained neural net (such as GPT-4) exhibits a really surprising property: it gets low loss on the training set (far lower than a random neural net).
We can formalize this in a similar way as the reversible circuit conjecture. Here’s a rough sketch:
Transformer performance no-coincidence conjecture:Consider a computable process that randomly generates text. The distribution has significantly lower entropy than the uniform distribution. Consider the property P(T) that says “the transformer T gets low average loss when predicting this process”. There is a deterministic polynomial time verifier V(T, π) such that:
P(T) implies that there exists a polynomial length π with V(T,π) = 1.
For 99% of transformers T, there is no π with V(T,π) = 1.
Note that “ignore π, and then test T on a small number of inputs” doesn’t work. P is only asking if T has low average loss, so you can’t falsify P with a small number of inputs.
I think your goal should be to show that your abstract conjecture implies the concrete result you’re after, or is even equivalent to it.
We can formalize this in a similar way as the reversible circuit conjecture. Here’s a rough sketch:
Transformer performance no-coincidence conjecture: Consider a computable process that randomly generates text. The distribution has significantly lower entropy than the uniform distribution. Consider the property P(T) that says “the transformer T gets low average loss when predicting this process”. There is a deterministic polynomial time verifier V(T, π) such that:
P(T) implies that there exists a polynomial length π with V(T,π) = 1.
For 99% of transformers T, there is no π with V(T,π) = 1.
Note that “ignore π, and then test T on a small number of inputs” doesn’t work. P is only asking if T has low average loss, so you can’t falsify P with a small number of inputs.