I’ve been fascinated by this beautiful paper by Viteri & DeDeo.
What is a mathematical insight? We feel intuitively that proving a difficult theorem requires discovering one or more key insights. Before we get into what the Dedeo-Viteri paper has to say about (mathematical) insights let me recall some basic observations on the nature of insights:
(see also my previous shortform)
There might be a unique decomposition, akin to prime factorization. Alternatively, there might many roads to Rome: some theorems can be proved in many different ways.
There are often many ways to phrase an essentially similar insight. These different ways to name things we feel are ‘inessential’. Different labelings should be easily convertible into one another.
By looping over all possible programs all proofs can be eventually found, so the notion of an ‘insight’ has to fundamentally be about feasibility.
Previously, I suggested a required insight is something like a private key to a trapdoor function. Without the insight you are facing an infeasible large task. With it, you can suddenly easily solve a whole host of new tasks/ problems
Insight may be combined in (arbitrarily?) complex ways.
When are two proofs of essentially different?
Some theorems can be proved in many different ways. That is different in the informal sense. It isn’t immediately clear how to make this more precise.
We could imagine there is a whole ‘homotopy’ theory of proofs, but before we do so we need to understand when two proofs are essentially the same or essentially different.
On one end of the spectrum, proofs can just be syntactically different but we feel they have ‘the same content’.
We can think type-theoretically, and say two proofs are the same when their denotations (normal forms) are the same. This is obviously better than just asking for syntactical equality or apartness. It does mean we’d like some sort of intuitionistic/type-theoretic foundation since a naive classicial foundations makes all normals forms equivalent.
We can also look at what assumptions are made in the proof. I.e. one of the proofs might use the Axiom of Choice, while the other does not. An example is the famous nonconstructive proof of the irrationality of ab which turns out to have a constructive proof as well.
If we consider proofs as functorial algorithms we can use mono-Anabelian transport to distinguish them in some case. [LINK!]
We can also think homotopy type-theoretically and ask when two terms of a type are equal in the HoTT sense.
With the exception of the mono-anabelian transport one—all these suggestions of ‘don’t go deep enough’, they’re too superficial.
Phase transitions and insights, Hopfield Networks & Ising Models
Modern ML models famously show some sort of phase transitions in understanding. People have been especially fascinated by the phenomenon of ’grokking, see e.g. here and here. It suggests we think of insights in terms of phase transitions, critical points etc.
Dedeo & Viteri have an ingenious variation on this idea. They consider a collection of famous theorems and their proofs formalized in a proof assistant.
They then imagine these proofs as a giant directed graph and consider a Boltzmann distributions on it. (so we are really dealing with an Ising model/ Hopfield network here). We think of this distribution as a measure of ‘trust’ both trust in propositions (nodes) and inferences (edges).
We show that the epistemic relationship between claims in a mathematical proof has a network structure that enables what we refer to as an epistemic phase transition (EPT): informally, while the truth of any particular path of argument connecting two points decays exponentially in force, the number of distinct paths increases. Depending on the network structure, the number of distinct paths may itself increase exponentially, leading to a balance point where influence can propagate at arbitrary distance (Stanley, 1971). Mathematical proofs have the structure necessary to make this possible. In the presence of bidirectional inference—i.e., both deductive and abductive reasoning—an EPT enables a proof to produce near-unity levels of certainty even in the presence of skepticism about the validity of any particular step. Deductive and abductive reasoning, as we show, must be well-balanced for this to happen. A relative over-confidence in one over the other can frustrate the effect, a phenomenon we refer to as the abductive paradox
The proofs of these famous theorems break up into ‘abductive islands’. They have natural modularity structure into lemmas.
EPTs are a double-edged sword, however, because disbelief can propagate just as easily as truth. A second prediction of the model is that this difficulty—the explosive spread of skepticism—can be ameliorated when the proof is made of modules: groups of claims that are significantly more tightly linked to each other than to the rest of the network.
(...) When modular structure is present, the certainty of any claim within a cluster is reasonably isolated from the failure of nodes outside that cluster.
One could hypothesize that insights might correspond somehow to these islands.
Final thoughts
I like the idea that a mathemathetical insight might be something like an island of deductively & abductively tightly clustered propositions.
Some questions:
How does this fit into the ‘Natural Abstraction’ - especially sufficient statistics?
EDIT: The separation property of Ludics, see e.g. here, points towards the point of view that proofs can be distinguished exactly by suitable (counter)models.
Insights as Islands of Abductive Percolation?
I’ve been fascinated by this beautiful paper by Viteri & DeDeo.
What is a mathematical insight? We feel intuitively that proving a difficult theorem requires discovering one or more key insights. Before we get into what the Dedeo-Viteri paper has to say about (mathematical) insights let me recall some basic observations on the nature of insights:
(see also my previous shortform)
There might be a unique decomposition, akin to prime factorization. Alternatively, there might many roads to Rome: some theorems can be proved in many different ways.
There are often many ways to phrase an essentially similar insight. These different ways to name things we feel are ‘inessential’. Different labelings should be easily convertible into one another.
By looping over all possible programs all proofs can be eventually found, so the notion of an ‘insight’ has to fundamentally be about feasibility.
Previously, I suggested a required insight is something like a private key to a trapdoor function. Without the insight you are facing an infeasible large task. With it, you can suddenly easily solve a whole host of new tasks/ problems
Insight may be combined in (arbitrarily?) complex ways.
When are two proofs of essentially different?
Some theorems can be proved in many different ways. That is different in the informal sense. It isn’t immediately clear how to make this more precise.
We could imagine there is a whole ‘homotopy’ theory of proofs, but before we do so we need to understand when two proofs are essentially the same or essentially different.
On one end of the spectrum, proofs can just be syntactically different but we feel they have ‘the same content’.
We can think type-theoretically, and say two proofs are the same when their denotations (normal forms) are the same. This is obviously better than just asking for syntactical equality or apartness. It does mean we’d like some sort of intuitionistic/type-theoretic foundation since a naive classicial foundations makes all normals forms equivalent.
We can also look at what assumptions are made in the proof. I.e. one of the proofs might use the Axiom of Choice, while the other does not. An example is the famous nonconstructive proof of the irrationality of ab which turns out to have a constructive proof as well.
If we consider proofs as functorial algorithms we can use mono-Anabelian transport to distinguish them in some case. [LINK!]
We can also think homotopy type-theoretically and ask when two terms of a type are equal in the HoTT sense.
With the exception of the mono-anabelian transport one—all these suggestions of ‘don’t go deep enough’, they’re too superficial.
Phase transitions and insights, Hopfield Networks & Ising Models
(See also my shortform on Hopfield Networks/ Ising models as mixtures of causal models)
Modern ML models famously show some sort of phase transitions in understanding. People have been especially fascinated by the phenomenon of ’grokking, see e.g. here and here. It suggests we think of insights in terms of phase transitions, critical points etc.
Dedeo & Viteri have an ingenious variation on this idea. They consider a collection of famous theorems and their proofs formalized in a proof assistant.
They then imagine these proofs as a giant directed graph and consider a Boltzmann distributions on it. (so we are really dealing with an Ising model/ Hopfield network here). We think of this distribution as a measure of ‘trust’ both trust in propositions (nodes) and inferences (edges).
The proofs of these famous theorems break up into ‘abductive islands’. They have natural modularity structure into lemmas.
One could hypothesize that insights might correspond somehow to these islands.
Final thoughts
I like the idea that a mathemathetical insight might be something like an island of deductively & abductively tightly clustered propositions.
Some questions:
How does this fit into the ‘Natural Abstraction’ - especially sufficient statistics?
How does this interact with Schmidthuber’s Powerplay?
EDIT: The separation property of Ludics, see e.g. here, points towards the point of view that proofs can be distinguished exactly by suitable (counter)models.