Infra-Bayesianism can be naturally understood as semantics for a certain non-classical logic. This promises an elegant synthesis between deductive/symbolic reasoning and inductive/intuitive reasoning, with several possible applications. Specifically, here we will explain how this can work for higher-order logic. There might be holes and/or redundancies in the precise definitions given here, but I’m quite confident the overall idea is sound.
We will work with homogenous ultracontributions (HUCs). □X will denote the space of HUCs over X. Given μ∈□X, S(μ)⊆ΔcX will denote the corresponding convex set. Given p∈ΔX and μ∈□X, p:μ will mean p∈S(μ). Given μ,ν∈□X, μ⪯ν will mean S(μ)⊆S(ν).
Syntax
Let Tι denote a set which we interpret as the types of individuals (we allow more than one). We then recursively define the full set of types T by:
0∈T (intended meaning: the uninhabited type)
1∈T (intended meaning: the one element type)
If α∈Tι then α∈T
If α,β∈T then α+β∈T (intended meaning: disjoint union)
If α,β∈T then α×β∈T (intended meaning: Cartesian product)
If α∈T then (α)∈T (intended meaning: predicates with argument of type α)
For each α,β∈T, there is a set F0α→β which we interpret as atomic terms of type α→β. We will denote V0α:=F01→α. Among those we distinguish the logical atomic terms:
prαβ∈F0α×β→α
iαβ∈F0α→α+β
Symbols we will not list explicitly, that correspond to the algebraic properties of + and × (commutativity, associativity, distributivity and the neutrality of 0 and 1). For example, given α,β∈T there is a “commutator” of type α×β→β×α.
∧α∈F0(α)×(α)→(α) [EDIT: Actually this doesn’t work because, except for finite sets, the resulting mapping (see semantics section) is discontinuous. There are probably ways to fix this.]
∃αβ∈F0(α×β)→(β)
∀αβ∈F0(α×β)→(β) [EDIT: Actually this doesn’t work because, except for finite sets, the resulting mapping (see semantics section) is discontinuous. There are probably ways to fix this.]
Assume that for each n∈N there is some Dn⊆□[n]: the set of “describable” ultracontributions [EDIT: it is probably sufficient to only have the fair coin distribution in D2 in order for it to be possible to approximate all ultracontributions on finite sets]. If μ∈Dn then ┌μ┐∈V(∑ni=11)
We recursively define the set of all terms Fα→β. We denote Vα:=F1→α.
If f∈F0α→β then f∈Fα→β
If f1∈Fα1→β1 and f2∈Fα2→β2 then f1×f2∈Fα1×α2→β1×β2
If f1∈Fα1→β1 and f2∈Fα2→β2 then f1+f2∈Fα1+α2→β1+β2
If f∈Fα→β then f−1:F(β)→(α)
If f∈Fα→β and g∈Fβ→γ then g∘f∈Fα→γ
Elements of V(α) are called formulae. Elements of V(1) are called sentences. A subset of V(1) is called a theory.
Semantics
Given T⊆V(1), a modelM of T is the following data. To each α∈T, there must correspond some compact Polish space M(t) s.t.:
M(0)=∅
M(1)=pt (the one point space)
M(α+β)=M(α)⊔M(β)
M(α×β)=M(α)×M(β)
M((α))=□M(α)
To each f∈Fα→β, there must correspond a continuous mapping M(f):M(α)→M(β), under the following constraints:
pr, i, diag and the “algebrators” have to correspond to the obvious mappings.
M(=α)=⊤diagM(α). Here, diagX⊆X×X is the diagonal and ⊤C∈□X is the sharp ultradistribution corresponding to the closed set C⊆X.
Consider α∈T and denote X:=M(α). Then, M(()α)=⊤□X⋉id□X. Here, we use the observation that the identity mapping id□X can be regarded as an infrakernel from □X to X.
M(⊥)=⊥pt
M(⊤)=⊤pt
S(M(∨)(μ,ν)) is the convex hull of S(μ)∪S(ν)
S(M(∧)(μ,ν)) is the intersection of S(μ)∪S(ν)
Consider α,β∈T and denote X:=M(α), Y:=M(β) and pr:X×Y→Y the projection mapping. Then, M(∃αβ)(μ)=pr∗μ.
Consider α,β∈T and denote X:=M(α), Y:=M(β) and pr:X×Y→Y the projection mapping. Then, p:M(∀αβ)(μ) iff for all q∈Δc(X×Y), if pr∗q=p then q:μ.
M(f1×f2)=M(f1)×M(f2)
M(f1+f2)=M(f1)⊔M(f2)
M(f−1)(μ)=M(f)∗(μ).
M(g∘f)=M(g)∘M(f)
M(┌μ┐)=μ
Finally, for each ϕ∈T, we require M(ϕ)=⊤pt.
Semantic Consequence
Given ϕ∈V(1), we say M⊨ϕ when M(ϕ)=⊤pt. We say T⊨ϕ when for any model M of T, M⊨ϕ. It is now interesting to ask what is the computational complexity of deciding T⊨ϕ. [EDIT: My current best guess is co-RE]
Applications
As usual, let A be a finite set of actions and O be a finite set of observation. Require that for each o∈O there is σo∈Tι which we interpret as the type of states producing observation o. Denote σ∗:=∑o∈Oσo (the type of all states). Moreover, require that our language has the nonlogical symbols s0∈V0(σ∗) (the initial state) and, for each a∈A, Ka∈F0σ∗→(σ∗) (the transition kernel). Then, every model defines a (pseudocausal) infra-POMDP. This way we can use symbolic expressions to define infra-Bayesian RL hypotheses. It is then tempting to study the control theoretic and learning theoretic properties of those hypotheses. Moreover, it is natural to introduce a prior which weights those hypotheses by length, analogical to the Solomonoff prior. This leads to some sort of bounded infra-Bayesian algorithmic information theory and bounded infra-Bayesian analogue of AIXI.
Let’s also explicitly describe 0th order and 1st order infra-Bayesian logic (although they are should be segments of higher-order).
0-th order
Syntax
Let A be the set of propositional variables. We define the language L:
Any a∈A is also in L
⊥∈L
⊤∈L
Given ϕ,ψ∈L, ϕ∧ψ∈L
Given ϕ,ψ∈L, ϕ∨ψ∈L
Notice there’s no negation or implication. We define the set of judgements J:=L×L. We write judgements as ϕ⊢ψ (”ψ in the context of ϕ”). A theory is a subset of J.
Semantics
Given T⊆J, a model of T consists of a compact Polish space X and a mapping M:L→□X. The latter is required to satisfy:
M(⊥)=⊥X
M(⊤)=⊤X
M(ϕ∧ψ)=M(ϕ)∧M(ψ). Here, we define ∧ of infradistributions as intersection of the corresponding sets
M(ϕ∨ψ)=M(ϕ)∨M(ψ). Here, we define ∨ of infradistributions as convex hull of the corresponding sets
For any ϕ⊢ψ∈T, M(ϕ)⪯M(ψ)
1-st order
Syntax
We define the language using the usual syntax of 1-st order logic, where the allowed operators are ∧, ∨ and the quantifiers ∀ and ∃. Variables are labeled by types from some set T. For simplicity, we assume no constants, but it is easy to introduce them. For any sequence of variables (v1…vn), we denote Lv the set of formulae whose free variables are a subset of v1…vn. We define the set of judgements J:=⋃vLv×Lv.
Semantics
Given T⊆J, a model of T consists of
For every t∈T, a compact Polish space M(t)
For every ϕ∈Lv where v1…vn have types t1…tn, an element Mv(ϕ) of □Xv, where Xv:=(∏ni=1M(ti))
It must satisfy the following:
Mv(⊥)=⊥Xv
Mv(⊤)=⊤Xv
Mv(ϕ∧ψ)=Mv(ϕ)∧Mv(ψ)
Mv(ϕ∨ψ)=Mv(ϕ)∨Mv(ψ)
Consider variables u1…un of types t1…tn and variables v1…vm of types s1…sm. Consider also some σ:{1…m}→{1…n} s.t. si=tσi. Given ϕ∈Lv, we can form the substitution ψ:=ϕ[vi=uσ(i)]∈Lu. We also have a mapping fσ:Xu→Xv given by fσ(x1…xn)=(xσ(1)…xσ(m)). We require Mu(ψ)=f∗(Mv(ϕ))
Consider variables v1…vn and i∈{1…n}. Denote pr:Xv→Xv∖vi the projection mapping. We require Mv∖vi(∃vi:ϕ)=pr∗(Mv(ϕ))
Consider variables v1…vn and i∈{1…n}. Denote pr:Xv→Xv∖vi the projection mapping. We require that p:Mv∖vi(∀vi:ϕ) if an only if, for all q∈ΔXv s.t pr∗q=p, q:pr∗(Mv(ϕ))
There is a special type of crisp infradistributions that I call “affine infradistributions”: those that, represented as sets, are closed not only under convex linear combinations but also under affine linear combinations. In other words, they are intersections between the space of distributions and some closed affine subspace of the space of signed measures. Conjecture: in 0-th order logic of affine infradistributions, consistency is polynomial-time decidable (whereas for classical logic it is ofc NP-hard).
To produce some evidence for the conjecture, let’s consider a slightly different problem. Specifically, introduce a new semantics in which □X is replaced by the set of linear subspaces of some finite dimensional vector space V. A model M is required to satisfy:
M(⊥)=0
M(⊤)=V
M(ϕ∧ψ)=M(ϕ)∩M(ψ)
M(ϕ∨ψ)=M(ϕ)+M(ψ)
For any ϕ⊢ψ∈T, M(ϕ)⊆M(ψ)
If you wish, this is “non-unitary quantum logic”. In this setting, I have a candidate polynomial-time algorithm for deciding consistency. First, we transform T into an equivalent theory s.t. all judgments are of the following forms:
a=⊥
a=⊤
a⊢b
Pairs of the form c=a∧b, d=a∨b.
Here, a,b,c,d∈A are propositional variables and “ϕ=ψ” is a shorthand for the pair of judgments ϕ⊢ψ and ψ⊢ϕ.
Second, we make sure that our T also satisfies the following “closure” properties:
If a⊢b and b⊢c are in T then so is a⊢c
If c=a∧b is in T then so are c⊢a and c⊢b
If c=a∨b is in T then so are a⊢c and b⊢c
If c=a∧b, d⊢a and d⊢b are in T then so is d⊢c
If c=a∨b, a⊢d and b⊢d are in T then so is c⊢d
Third, we assign to each a∈A a real-valued variable xa. Then we construct a linear program for these variables consisting of the following inequalities:
For any a∈A: 0≤xa≤1
For any a⊢b in T: xa≤xb
For any pair c=a∧b and d=a∨b in T: xc+xd=xa+xb
For any a=⊥: xa=0
For any a=⊤: xa=1
Conjecture: the theory is consistent if and only if the linear program has a solution. To see why it might be so, notice that for any model M we can construct a solution by setting
xa:=dimM(a)dimM(⊤)
I don’t have a full proof for the converse but here are some arguments. If a solution exists, then it can be chosen to be rational. We can then rescale it to get integers which are candidate dimensions of our subspaces. Consider the space of all ways to choose subspaces of these dimensions s.t. the constraints coming from judgments of the form a⊢b are satisfied. This is a moduli space of poset representations. It is easy to see it’s non-empty (just let the subspaces be spans of vectors taken from a fixed basis). By Proposition A.2 in Futorny and Iusenko it is an irreducible algebraic variety. Therefore, to show that we can also satisfy the remaining constraints, it is enough to check that (i) the remaining constraints are open (ii) each of the remaining constraints (considered separately) holds at some point of the variety. The first is highly likely and the second is at least plausible.
The algorithm also seems to have a natural extension to the original infra-Bayesian setting.
When using infra-Bayesian logic to define a simplicity prior, it is natural to use “axiom circuits” rather than plain formulae. That is, when we write the axioms defining our hypothesis, we are allowed to introduce “shorthand” symbols for repeating terms. This doesn’t affect the expressiveness, but it does affect the description length. Indeed, eliminating all the shorthand symbols can increase the length exponentially.
Instead of introducing all the “algebrator” logical symbols, we can define T as the quotient by the equivalence relation defined by the algebraic laws. We then need only two extra logical atomic terms:
For any n∈N and σ∈Sn (permutation), denote n:=∑ni=11 and require σ+∈Fn→n
For any n∈N and σ∈Sn, σ×α∈Fαn→αn
However, if we do this then it’s not clear whether deciding that an expression is a well-formed term can be done in polynomial time. Because, to check that the types match, we need to test the identity of algebraic expressions and opening all parentheses might result in something exponentially long.
Actually the Schwartz–Zippel algorithm can easily be adapted to this case (just imagine that types are variables over Q, and start from testing the identity of the types appearing inside parentheses), so we can validate expressions in randomized polynomial time (and, given standard conjectures, in deterministic polynomial time as well).
Infra-Bayesianism can be naturally understood as semantics for a certain non-classical logic. This promises an elegant synthesis between deductive/symbolic reasoning and inductive/intuitive reasoning, with several possible applications. Specifically, here we will explain how this can work for higher-order logic. There might be holes and/or redundancies in the precise definitions given here, but I’m quite confident the overall idea is sound.
We will work with homogenous ultracontributions (HUCs). □X will denote the space of HUCs over X. Given μ∈□X, S(μ)⊆ΔcX will denote the corresponding convex set. Given p∈ΔX and μ∈□X, p:μ will mean p∈S(μ). Given μ,ν∈□X, μ⪯ν will mean S(μ)⊆S(ν).
Syntax
Let Tι denote a set which we interpret as the types of individuals (we allow more than one). We then recursively define the full set of types T by:
0∈T (intended meaning: the uninhabited type)
1∈T (intended meaning: the one element type)
If α∈Tι then α∈T
If α,β∈T then α+β∈T (intended meaning: disjoint union)
If α,β∈T then α×β∈T (intended meaning: Cartesian product)
If α∈T then (α)∈T (intended meaning: predicates with argument of type α)
For each α,β∈T, there is a set F0α→β which we interpret as atomic terms of type α→β. We will denote V0α:=F01→α. Among those we distinguish the logical atomic terms:
prαβ∈F0α×β→α
iαβ∈F0α→α+β
Symbols we will not list explicitly, that correspond to the algebraic properties of + and × (commutativity, associativity, distributivity and the neutrality of 0 and 1). For example, given α,β∈T there is a “commutator” of type α×β→β×α.
=α∈V0(α×α)
diagα∈F0α→α×α
()α∈V0((α)×α) (intended meaning: predicate evaluation)
⊥∈V0(1)
⊤∈V0(1)
∨α∈F0(α)×(α)→(α)
∧α∈F0(α)×(α)→(α) [EDIT: Actually this doesn’t work because, except for finite sets, the resulting mapping (see semantics section) is discontinuous. There are probably ways to fix this.]
∃αβ∈F0(α×β)→(β)
∀αβ∈F0(α×β)→(β) [EDIT: Actually this doesn’t work because, except for finite sets, the resulting mapping (see semantics section) is discontinuous. There are probably ways to fix this.]
Assume that for each n∈N there is some Dn⊆□[n]: the set of “describable” ultracontributions [EDIT: it is probably sufficient to only have the fair coin distribution in D2 in order for it to be possible to approximate all ultracontributions on finite sets]. If μ∈Dn then ┌μ┐∈V(∑ni=11)
We recursively define the set of all terms Fα→β. We denote Vα:=F1→α.
If f∈F0α→β then f∈Fα→β
If f1∈Fα1→β1 and f2∈Fα2→β2 then f1×f2∈Fα1×α2→β1×β2
If f1∈Fα1→β1 and f2∈Fα2→β2 then f1+f2∈Fα1+α2→β1+β2
If f∈Fα→β then f−1:F(β)→(α)
If f∈Fα→β and g∈Fβ→γ then g∘f∈Fα→γ
Elements of V(α) are called formulae. Elements of V(1) are called sentences. A subset of V(1) is called a theory.
Semantics
Given T⊆V(1), a model M of T is the following data. To each α∈T, there must correspond some compact Polish space M(t) s.t.:
M(0)=∅
M(1)=pt (the one point space)
M(α+β)=M(α)⊔M(β)
M(α×β)=M(α)×M(β)
M((α))=□M(α)
To each f∈Fα→β, there must correspond a continuous mapping M(f):M(α)→M(β), under the following constraints:
pr, i, diag and the “algebrators” have to correspond to the obvious mappings.
M(=α)=⊤diagM(α). Here, diagX⊆X×X is the diagonal and ⊤C∈□X is the sharp ultradistribution corresponding to the closed set C⊆X.
Consider α∈T and denote X:=M(α). Then, M(()α)=⊤□X⋉id□X. Here, we use the observation that the identity mapping id□X can be regarded as an infrakernel from □X to X.
M(⊥)=⊥pt
M(⊤)=⊤pt
S(M(∨)(μ,ν)) is the convex hull of S(μ)∪S(ν)
S(M(∧)(μ,ν)) is the intersection of S(μ)∪S(ν)
Consider α,β∈T and denote X:=M(α), Y:=M(β) and pr:X×Y→Y the projection mapping. Then, M(∃αβ)(μ)=pr∗μ.
Consider α,β∈T and denote X:=M(α), Y:=M(β) and pr:X×Y→Y the projection mapping. Then, p:M(∀αβ)(μ) iff for all q∈Δc(X×Y), if pr∗q=p then q:μ.
M(f1×f2)=M(f1)×M(f2)
M(f1+f2)=M(f1)⊔M(f2)
M(f−1)(μ)=M(f)∗(μ).
M(g∘f)=M(g)∘M(f)
M(┌μ┐)=μ
Finally, for each ϕ∈T, we require M(ϕ)=⊤pt.
Semantic Consequence
Given ϕ∈V(1), we say M⊨ϕ when M(ϕ)=⊤pt. We say T⊨ϕ when for any model M of T, M⊨ϕ. It is now interesting to ask what is the computational complexity of deciding T⊨ϕ. [EDIT: My current best guess is co-RE]
Applications
As usual, let A be a finite set of actions and O be a finite set of observation. Require that for each o∈O there is σo∈Tι which we interpret as the type of states producing observation o. Denote σ∗:=∑o∈Oσo (the type of all states). Moreover, require that our language has the nonlogical symbols s0∈V0(σ∗) (the initial state) and, for each a∈A, Ka∈F0σ∗→(σ∗) (the transition kernel). Then, every model defines a (pseudocausal) infra-POMDP. This way we can use symbolic expressions to define infra-Bayesian RL hypotheses. It is then tempting to study the control theoretic and learning theoretic properties of those hypotheses. Moreover, it is natural to introduce a prior which weights those hypotheses by length, analogical to the Solomonoff prior. This leads to some sort of bounded infra-Bayesian algorithmic information theory and bounded infra-Bayesian analogue of AIXI.
Let’s also explicitly describe 0th order and 1st order infra-Bayesian logic (although they are should be segments of higher-order).
0-th order
Syntax
Let A be the set of propositional variables. We define the language L:
Any a∈A is also in L
⊥∈L
⊤∈L
Given ϕ,ψ∈L, ϕ∧ψ∈L
Given ϕ,ψ∈L, ϕ∨ψ∈L
Notice there’s no negation or implication. We define the set of judgements J:=L×L. We write judgements as ϕ⊢ψ (”ψ in the context of ϕ”). A theory is a subset of J.
Semantics
Given T⊆J, a model of T consists of a compact Polish space X and a mapping M:L→□X. The latter is required to satisfy:
M(⊥)=⊥X
M(⊤)=⊤X
M(ϕ∧ψ)=M(ϕ)∧M(ψ). Here, we define ∧ of infradistributions as intersection of the corresponding sets
M(ϕ∨ψ)=M(ϕ)∨M(ψ). Here, we define ∨ of infradistributions as convex hull of the corresponding sets
For any ϕ⊢ψ∈T, M(ϕ)⪯M(ψ)
1-st order
Syntax
We define the language using the usual syntax of 1-st order logic, where the allowed operators are ∧, ∨ and the quantifiers ∀ and ∃. Variables are labeled by types from some set T. For simplicity, we assume no constants, but it is easy to introduce them. For any sequence of variables (v1…vn), we denote Lv the set of formulae whose free variables are a subset of v1…vn. We define the set of judgements J:=⋃vLv×Lv.
Semantics
Given T⊆J, a model of T consists of
For every t∈T, a compact Polish space M(t)
For every ϕ∈Lv where v1…vn have types t1…tn, an element Mv(ϕ) of □Xv, where Xv:=(∏ni=1M(ti))
It must satisfy the following:
Mv(⊥)=⊥Xv
Mv(⊤)=⊤Xv
Mv(ϕ∧ψ)=Mv(ϕ)∧Mv(ψ)
Mv(ϕ∨ψ)=Mv(ϕ)∨Mv(ψ)
Consider variables u1…un of types t1…tn and variables v1…vm of types s1…sm. Consider also some σ:{1…m}→{1…n} s.t. si=tσi. Given ϕ∈Lv, we can form the substitution ψ:=ϕ[vi=uσ(i)]∈Lu. We also have a mapping fσ:Xu→Xv given by fσ(x1…xn)=(xσ(1)…xσ(m)). We require Mu(ψ)=f∗(Mv(ϕ))
Consider variables v1…vn and i∈{1…n}. Denote pr:Xv→Xv∖vi the projection mapping. We require Mv∖vi(∃vi:ϕ)=pr∗(Mv(ϕ))
Consider variables v1…vn and i∈{1…n}. Denote pr:Xv→Xv∖vi the projection mapping. We require that p:Mv∖vi(∀vi:ϕ) if an only if, for all q∈ΔXv s.t pr∗q=p, q:pr∗(Mv(ϕ))
For any ϕ⊢ψ∈T, Mv(ϕ)⪯Mv(ψ)
There is a special type of crisp infradistributions that I call “affine infradistributions”: those that, represented as sets, are closed not only under convex linear combinations but also under affine linear combinations. In other words, they are intersections between the space of distributions and some closed affine subspace of the space of signed measures. Conjecture: in 0-th order logic of affine infradistributions, consistency is polynomial-time decidable (whereas for classical logic it is ofc NP-hard).
To produce some evidence for the conjecture, let’s consider a slightly different problem. Specifically, introduce a new semantics in which □X is replaced by the set of linear subspaces of some finite dimensional vector space V. A model M is required to satisfy:
M(⊥)=0
M(⊤)=V
M(ϕ∧ψ)=M(ϕ)∩M(ψ)
M(ϕ∨ψ)=M(ϕ)+M(ψ)
For any ϕ⊢ψ∈T, M(ϕ)⊆M(ψ)
If you wish, this is “non-unitary quantum logic”. In this setting, I have a candidate polynomial-time algorithm for deciding consistency. First, we transform T into an equivalent theory s.t. all judgments are of the following forms:
a=⊥
a=⊤
a⊢b
Pairs of the form c=a∧b, d=a∨b.
Here, a,b,c,d∈A are propositional variables and “ϕ=ψ” is a shorthand for the pair of judgments ϕ⊢ψ and ψ⊢ϕ.
Second, we make sure that our T also satisfies the following “closure” properties:
If a⊢b and b⊢c are in T then so is a⊢c
If c=a∧b is in T then so are c⊢a and c⊢b
If c=a∨b is in T then so are a⊢c and b⊢c
If c=a∧b, d⊢a and d⊢b are in T then so is d⊢c
If c=a∨b, a⊢d and b⊢d are in T then so is c⊢d
Third, we assign to each a∈A a real-valued variable xa. Then we construct a linear program for these variables consisting of the following inequalities:
For any a∈A: 0≤xa≤1
For any a⊢b in T: xa≤xb
For any pair c=a∧b and d=a∨b in T: xc+xd=xa+xb
For any a=⊥: xa=0
For any a=⊤: xa=1
Conjecture: the theory is consistent if and only if the linear program has a solution. To see why it might be so, notice that for any model M we can construct a solution by setting
xa:=dimM(a)dimM(⊤)
I don’t have a full proof for the converse but here are some arguments. If a solution exists, then it can be chosen to be rational. We can then rescale it to get integers which are candidate dimensions of our subspaces. Consider the space of all ways to choose subspaces of these dimensions s.t. the constraints coming from judgments of the form a⊢b are satisfied. This is a moduli space of poset representations. It is easy to see it’s non-empty (just let the subspaces be spans of vectors taken from a fixed basis). By Proposition A.2 in Futorny and Iusenko it is an irreducible algebraic variety. Therefore, to show that we can also satisfy the remaining constraints, it is enough to check that (i) the remaining constraints are open (ii) each of the remaining constraints (considered separately) holds at some point of the variety. The first is highly likely and the second is at least plausible.
The algorithm also seems to have a natural extension to the original infra-Bayesian setting.
When using infra-Bayesian logic to define a simplicity prior, it is natural to use “axiom circuits” rather than plain formulae. That is, when we write the axioms defining our hypothesis, we are allowed to introduce “shorthand” symbols for repeating terms. This doesn’t affect the expressiveness, but it does affect the description length. Indeed, eliminating all the shorthand symbols can increase the length exponentially.
Instead of introducing all the “algebrator” logical symbols, we can define T as the quotient by the equivalence relation defined by the algebraic laws. We then need only two extra logical atomic terms:
For any n∈N and σ∈Sn (permutation), denote n:=∑ni=11 and require σ+∈Fn→n
For any n∈N and σ∈Sn, σ×α∈Fαn→αn
However, if we do this then it’s not clear whether deciding that an expression is a well-formed term can be done in polynomial time. Because, to check that the types match, we need to test the identity of algebraic expressions and opening all parentheses might result in something exponentially long.
Actually the Schwartz–Zippel algorithm can easily be adapted to this case (just imagine that types are variables over Q, and start from testing the identity of the types appearing inside parentheses), so we can validate expressions in randomized polynomial time (and, given standard conjectures, in deterministic polynomial time as well).