Plausibly Factoring Conjectures

Introduction

In software design, several key principles share a common foundation: the concept of factoring, or breaking complex problems into smaller, manageable parts. Three prominent examples are:

  1. Unix philosophy: Design programs to do one thing well.

  2. Black box abstraction: Isolate functionality to focus on core problems.

  3. Interface-based programming: Build systems from well-defined, interconnected modules.

These principles, though known by different names, all rely on factoring to create efficient, reliable, and maintainable systems. Even when developers aren’t sure how to factor a problem, they often intuitively recognize that it should be factored: this notion of factoring has a fundamental role in effective software design.

In this post, we consider how this idea of factoring applies to mathematical theorem proving. Similar to the Unix philosophy, there is a sort of factoring that takes place in mathematics: breaking down difficult conjectures into simpler lemmas. But just as in software, collaboration is crucial for solving these problems, with mathematicians often building upon one another’s work, even without full awareness of all the details involved. This is reminiscent of Milton Friedman’s Lesson of the Pencil: the wood chopper on one side of the world has nothing in common with the granite miner on another, and neither of them know how to make a pencil by themselves. Yet, their incentives fall such that pencils get manufactured as a global project.

Just as in software and in the manufacturing of a pencil, success relies on collaboration and breaking problems into manageable parts. To that end, we present a first attempt at an incentive structure to encourage collaborative theorem proving, inspired by a recent concept called Plausible Fiction.

Plausible Fiction

In a recent Topos Institute blog post, Spivak proposed a general theory of change, which he called Plausible Fiction. A work of plausible fiction has four requirements:

  • It begins with the world as it is now, including all relevant details and without distortion;

  • It ends with a future that the author desires;

  • It follows a trajectory that obeys natural laws and social dynamics; and

  • It is memetically fit for collaborators: they readily understand and can spread the idea.

The mechanism that makes plausible fiction work is gap-filling. To collaborate on a work of plausible fiction, one finds a gap in the story and fills it in with more plausible fiction. Eventually, the gaps get small enough that they can be fulfilled easily, e.g. “the bottle of water is carried out of the store and given to Alice” is only fiction until Bob carries the bottle of water out to Alice. Participants with different strengths can collaborate to make the changes they wish to see in the world.

The mathematics community is also one with a great diversity of strengths. Consider the Liquid Tensor Experiment, in which a global community of mathematicians, led by Peter Scholze, worked with the Lean proof assistant to verify many complex and interrelated proofs. Each contributor specialized in different areas, reminiscent of the modular components in software, bringing unique skills to tackle specific lemmas. The experiment’s success showed how modern mathematical proofs can be global projects, with blueprint software helping to organize disparate contributions into one verified whole.

Just as blueprint software organizes collaboration in formalizing conjectures, Plausible Fiction enables participants to envision and construct desired futures by progressively filling gaps. Here, we aim to merge these ideas, using Plausible Fiction as a framework for collaborative mathematical theorem proving, where larger conjectural gaps are filled by simpler ones and eventually proven.

Indeed, plausible fiction and mathematical conjecture have a lot in common. When a mathematician thinks that a statement is likely to be true, or at least plausible, they publicize it as their conjecture. This gives an incentive for other mathematicians to prove it, though sometimes their proofs are based on further conjectures. For example, Bill Thurston proved that Poincaré′s Conjecture about 3-dimensional spheres followed from his own Geometrization Conjecture, which Perelman later proved. And some of Perelman’s lemmas were considered difficult enough that other professional mathematicians had to complete the proofs.

One could say that a mathematician’s skill is in factoring conjectures into simpler ones: “You want to prove A, do you? Well, I can prove A for you, but only if someone can find a proof of B, C, and D”. Once the conjectures get simple enough, they’re just carried out by hand, as Bob did with the water bottle in the story above. In the Mechanism section below, we’ll give a simple data structure and process that could serve as a backend for a platform. We’re calling this the plausibly factored platform, which we believe will encourage theorem proving. In the Outlook section we’ll explain the sense in which it is very much a work in progress.

Previous work

Past work has provided some market designs for incentivizing logical steps and mathematical proofs. Demski previously suggested a market in MathBucks. In this system, mathematicians are rewarded not only for proofs but for contributing to understanding with intermediary explanations. DaemonicSigil rightly points out, “Simply directly declaring a $100 million reward for a solution would probably not work. For one thing, there’s the issue of corollary-sniping where the prize wouldn’t give anyone an incentive to publish solutions to hard intermediate steps of the problem, since the prize as a whole only goes to the one who solves the entire problem as a whole”. DaemonicSigil proposes a mechanism in which traders can buy truth for a dollar and get false for free, but can trade any logically equivalent expressions with each other on the open market. Ought’s factored cognition research also touched on these ideas, especially the relay programming experiment.

DaemonicSigil’s design relies on a market maker selling Truth and Falsum symbols that parties can to create logical “goods” to trade with each other. We are concerned that the overall transparency of the collaboration between parties is lower in this setting, since they can profit off of each other’s follies and alignment is lower.

Mechanism of the Plausibly Factored platform

What we propose here is only a first pass at a platform, and we expect it to be refined as time goes on. Indeed, this is how plausible fiction is meant to work. Our job is only to make the ideas more plausible and leave gaps for others to fill. Let’s begin by fixing a proof assistant, like Agda, Coq, Lean, etc., as well as an associated Knowledge Base of proven theorems.

The primary data type that is relevant here is the following record type:

Contract := { BuyerName : String, PleaseProve : Type, Bounty : ℝ>0 }

This is fairly self-explanatory: whoever wants a conjecture to be proven is called a buyer. They formulate a type in the proof assistant, and they offer a certain bounty for it to be proven. To submit their request, they pay the money to PPP, at which point it is in escrow, meaning they cannot rescind it. (Perhaps they can give some time-bound—“if this isn’t proven by 2025, then I rescind my offer”—but we won’t address this issue in the present post.)

The game is played by proposing factorizations of existing conjectures. A factorization can be visualized like this

Here, T might be a conjecture such as A & B ⇒ C & D. Another party, say Alice, might propose that T1 and T2 imply T, where T1 is A ⇒ C & E and T2 is E & B ⇒ D. Now, T1 and T2 are smaller gaps that could be filled by someone else. This is Alice’s “alpha”, i.e. her private intellectual property. So contributions of this sort constitute a second factor we need to include in the market.

The market itself consists of a public market and private ledger:

MarketState := PublicMarket x PrivateLedger

Let’s start with the public market. It is just a time-varying list of contracts:

PublicMarket := List(Contract)

For example, in 1904 the contracts of the public market state was perhaps as follows:

[...,
(
  Fermat,
  “∀(a,b,c : ℕ≥1). ∀(n : ℕ≥3). (an + bn = cn) ⇒ False”,
  $357
),
(
  Poincaré,
  “Every simply connected, closed 3-manifold is homeomorphic to a 3-dimensional sphere”,
  $99
),
...]

meaning that someone named Fermat had a conjecture about natural numbers on the books, for which the bounty was $357, and someone named Poincaré had a conjecture about 3-manifolds on the books, for which the bounty was $99. In fact, what we wrote for Poincaré’s contract is not quite valid: it should compile as a type according to a given proof assistant. Let’s refer to the compiling type as PoincareConj.

Similarly, the private ledger is just a list of private entries

PrivateLedger := List(PrivateEntry)

where a private entry is the following data type:

PrivateEntry := {
ContributorName : String,
Premises : List(Type),
Conclusion : Type,
⍺ : Premises -> Conclusion }

The market data structure maintains the private ledger, but does not publish it; we’ll discuss reasons for privacy below in the Analysis section. The proof term ⍺ is verified by the theorem prover as a term of the given type, Premises → Conclusion. When the Premises list is empty, we say that this entry is closed, because we have a closed term (in the sense of Type theory) of the Conclusion type.

At any time, a user can either submit a contract or a private entry to the API. Upon submission, the market processes the entry and triggers the appropriate actions. The growing public Knowledge Base also functions like a hint database as in Coq or Lean. As it grows, the power of trivial mechanistic automations grows with it. So for example, basic inference rules like modus ponens should be in the hint database at initialization, whereas more involved knowledge like Thurston’s GeomConjecture → PoincareConjecture seen below can only enter the lexicon after some market activity.

Each private entry acts as a trigger. Once all its premises are in the associated Knowledge Base, its Conclusion has been proven, and two events are triggered. First, any contract for which PleaseProve = Conclusion is cleared: its WillPay is paid to ContributorName and it is removed from the PublicMarket list. Second, ⍺ is itself added to the Knowledge Base and thereby becomes public. (If multiple contributors have private entries whose last premise is satisfied at the same time, and these entries simultaneously satisfy the contract, the bounty is split equally among all such contributors.)

A very special kind of contribution is that with an empty list of premises (a “thunk”, or a proof from unit). This is in fact a theorem of type Conclusion. As above, any contract with PleaseProve=Conclusion is cleared, and Conclusion is added to the knowledge base. In this clearing action, truth “bubbles up” through the market triggering payouts.

Worked Example

For example, Thurston could submit a private entry

(Thurston, [GeomConjecture], PoincareConj, 3DMKGHG)

where 3DMKGHG refers to his work in Three-Dimensional Manifolds, Kleinian Groups and Hyperbolic Geometry, or a Lean/​Coq compiling version thereof. He could simultaneously offer a contract:

(Thurston, GeomConjecture, $90).

Then Perelman comes along and proves GeomConjecture (let’s assume that Perelman proves it completely, not needing any help from Xiping and Huaidong, etc.) in a private entry

(Perelman, [], GeomConjecture, EFRFGA)

where EFRFGA consists of some compiling theorems from The Entropy Formula for the Ricci Flow and its Geometric Applications and other papers. At this point, this entry is triggered and the Geometrization Conjecture is considered proven. It is added to the Knowledge Base, and Perelman is paid $90 from Thurston’s contract. Immediately following this, Thurston’s private entry is triggered, since its only premise is proven, and the Poincaré Conjecture is also considered proven. It is added to the Knowledge Base and Thurston is paid $99 from Poincaré’s contract.

Analysis

As should be clear, any party can be a counterparty to some contracts and principal to others; that is, they may receive money from some and pay money to others. This is just like in a category: an object can both be the domain of some arrow and codomain of others.

A question we promised to address is why we need privacy in the private entries, where proofs ⍺: P1 ∧ ⋯ ∧ Pn → Q are contributed, e.g. by Alice. The reason is that without this, someone else could find equivalent P1’≅ P1, modify Alice’s proof to ⍺’: P1’ ∧ ⋯ ∧ Pn → Q, not have to pay a bounty on P2 ,..., Pn, and if P1’ is proven before P1, get all the reward. You might say that the system should constantly check for such equivalences, but doing so for every pair of premises in the system would require a huge amount of overhead. Moreover, as David Jaz Myers points out, every two true statements are equivalent.

One last issue that we should mention is that splitting a conjunction is expensive. For example, suppose (Alice, P&Q, $10) is a $10 bounty on P&Q. Bob, that clever rascal, contributes some private entry (Bob, [P, Q], P&Q, auto) and two public contracts (Bob, P, $3) and (Bob, Q, $4). In fact, Bob has taken a significant risk; his $7 is in escrow and he may never see the $10. Even with a time-bound on his bounty, he could be in trouble. Suppose that P is easily provable but conjecture Q turns out to be false. Tomorrow, Carl submits the private entry (Carl, [], P, prf), triggering the payout of Bob’s $3.

In general, factoring a conjecture and offering a bounty on the factors is a risk. It remains to be seen if this is bearable by a market—i.e. whether our fictional platform is plausible—or not. If not, we hope others will fill this gap: this is how plausible fiction works.

Where do prices come from?

A crucial design decision is to offload the whole problem of estimating the plausibility of a lemma and its eventual usefulness to each user’s implicit pricing function. The first desiderata that users ought to aim for when implementing this function is that the cost should be higher for fictions that are less plausible, and lower for more plausible or “easier” fictions. As counterparties filling gaps, one should price the filling of a wider gap higher than the filling of a slimmer gap. The second desiderata is that one should feel free to price higher lemmas that could plausibly pay back in multiple ways, a lemma that is useful for a multiplicity of theorems should have that multiplicity reflected in its price.

Takeaways

A crucial feature of this game is that it’s agnostic about whether players are some agent architecture based on an LLM or humans. In some contexts, compensating AI agents for their labor may be appropriate, and this system could set a useful precedent.

We think the kinds of proofs we might see coming out of the Plausibly Factored game may be more interpretable and elucidating than those emitted from a pure next-token predictor. Running this market could offer a new way to solve Lean programming benchmarks like miniF2F, and we could observe the outputs to compare them to what LLMs do to solve miniF2F now to check that hypothesis. Indeed, if a program specification is given as a type then finding a term of that type ( a program that satisfies the spec) is again the sort of endeavor that could be factored

As DaemonicSigil pointed out, current mathematical incentives may do a bad job at eliciting intermediary work like important lemmas and cruxy (pivotal) conjectures. The Plausibly Factored protocol shares reward for major theorems with people who provided intermediary lemmas.

Outlook

Plausible fiction was originally intended to be a public and transparent process, where each collaborator fills gaps openly. In contrast, the current iteration of our Plausibly Factored platform is a bit of a bizarro version, introducing competition and private submissions.

How did these reversals happen? In the original plausible fiction framework, the incentive was an abstract, collective good—a “good future” that would hopefully naturally motivate individuals to participate. In the mathematical context, however, financial incentives may help stimulate participation in this formal and precise process, so that contributions can be transparently verified and rewarded through formal proof systems. That said, financial incentives are just one way to stimulate participation, intended to complement existing models such as academic grants and institutional support. The ethical implications of using financial incentives are still in question for us, and we invite comment along those lines.

One gap in our protocol is the lack of a “royalties” system to incentivize “test of time” lemmas that get reused over and over again. Users only get paid once for the bounties that utilize their lemma at insertion time. This is a downside of the current system, but many naive implementations of royalties would come with their own downsides to mitigate (such as, we’d have to make sure no one could get a piece of modus ponens).

In conclusion, we want to stress that the Plausibly Factored platform is a work in progress. The system, particularly in its current form, is designed for smaller-scale projects or informal games between friends and colleagues, and is not meant to scale into a market-driven overhaul of mathematical research. To address concerns about wealth concentration, future iterations could include mechanisms like capped bounties or community-driven funding pools. These steps would ensure that the system remains balanced, avoiding the concentration of power among a few wealthy participants. If readers have suggestions on how to improve the platform, particularly to cultivate more open collaboration, please let us know in the comments.