I’ve already talked about generalised models. The aim is not only to have a universal system for modelling any agent’s mental model—universality is pretty easy to get—but a system where it’s easy to recreate these mental models. And then analyse the transition between models.
This post will show that if there is a morphism r between two models (say, between ideal gas laws and models of atoms bouncing around), then there is an underlying model for that morphism.
Specifically, if r is a morphism between M0=(F0,Q0) and M1=(F1,Q1), then there is a generalised model Mr defined from r. The features of this model are the combination of the features of the two models: F0⊔F1, and there are natural morphisms r0 and r1 from this underlying model to M0 and M1:
Now, if W0 and W1 are the sets of possible worlds for M0 and M1, then W0×W1 is the set of possible worlds for Mr. Then since r is a relation between W0 and W1, it can be seen as subset of W0×W1. And the Qr is a probability distribution over this subset r.
What this means is that Qr measures how probability ‘flows’ from worlds in M0 to worlds in M1. If (w0,w1) is an element of r, then Qr(w0,w1) measures how much probability is flowing from w0 to w1. The actual probability of w0 is the sum of all probability flowing out of it; that of w1, the sum of the probability flowing into it.
See for example this diagram, where the Q0 probabilities are indicated in blue, those of Qr in black, and those of Q1 in red. The probabilities Q0 and Q1 are the sum of the relevant probabilities Qr on the “edges” connecting to those points:
The distribution Qr is non-unique, though. The following two examples show situations with the same Q0 and Q1, but different Qr:
The rest of this post will be dedicated to prove the existence of the underlying model for the morphism r; it can be skipped if you aren’t interested.
Proof of underlying model
Definitions
Previous posts on generalised models defined them as triplets M=(F,E,Q), with F a set of features, W=2¯¯¯¯F the set of possible worlds for those features, E⊂W a subset of environments, and Q a probability distribution on E.
But E was mainly superfluous, as Q can be extended to a probability distribution on all of W just by setting it to be zero on W−E. Thus E was dropped from the definition.
The original definition allowed Q to be a partial probability distribution, but here we’ll assume it’s a total probability distribution (though not necessarily normalised; Q(W) need not be 1). The sets of features are assumed to be finite.
Then a morphism r between generalised models M0=(F0,Q0) and M1=(F1,Q1) is a binary relation between W0 and W1, such that:
Q0(E0)≤Q1(r(E0)),
Q1(E1)≤Q0(r−1(E1)).
We might extend the class of morphisms by defining relations that only obey the first inequality as “left-morphisms”, and relations that only obey the second one as a “right-morphisms”. Left-morphisms ensure probability isn’t lost (Q1(W1)≥Q0(W0)), right morphisms ensure probability isn’t gained (Q0(W0)≥Q1(W1)). Full morphisms, of course, ensure that probability isn’t gained or lost (Q0(W0)=Q1(W1)).
Binary relations are not necessarily functions; functions are relations r such that each w0 in W0 is related to exactly one w1 in W1.
Statement of the theorem
Let r be a morphism between M0=(F0,Q0) and M1=(F1,Q1). Then there exists a generalised model Mr=(F0⊔F1,Qr), with natural function morphisms r0:Mr→M0 and r1:Mr→M1.
The Qr is non-zero on a set contained in r⊂W0×W1=2F0×2F1=2F0⊔F1. The Qr need not be uniquely defined, but the total measure of Qr is the same as Q0 and Q1:
Qr(r)=Qr(W0×W1)=Q0(W0)=Q1(W1).
Main proof
The function r0 is just projection onto the first component: it sends (w0,w1) to w0. The functions r1 conversely send (w0,w1) to w1.
Because r0 and r1 are functions, they can ‘push-forward’ any probability distribution Q′r on W0×W1 to W0 and W1, respectively. This is given by: r0(Q′r)(w0)=∑w1Qr(w0,w1), and similarly for r1(Q′r).
We aim to construct a Q′r such that r0(Q′r)=Q0 and r1(Q′r)=Q1; this will be our Qr, and will make r0 and r1 into morphisms.
Define Q′r(w0,w1) to be zero if (w0,w1)∉r, or Q0(w0)=0 or Q1(w1)=0. Thus we will ignore any elements of W0 and W1 of measure zero, and any element of W0×W1 that is not in r.
Let w0∈W0 be such that it is not related to any elements of w1 by r. Then Q0(w0)≤Q1(r(w0))=Q1(∅)=0. Thus any element of W0 with non-zero measure is related to some w1 via r.
Then define a choice function c that maps every element w0 with Q0(w0)>0, to an element w1 that it is related to by r. And define Q′r(w0,c(w0))=Q0(w0), and Q′r is zero on all other elements of W0×W1.
Then r0(Q′r)(w0)=∑(w0,w1)Q′r(w0,w1)=Q′r(w0,c(w0))=Q0(w0). Hence r0(Q′r)=Q0. Consequently, Q′r(W0×W1)=Q0(W0).
Define Q0 as the set of Q′r, probability distributions on r with r0(Q′r)=Q0. We’ve shown that Q0 is non-empty; moreover, any Q′r∈Q0 has a total measure equal to Q0(W0)=q. Since Q′r is defined on r, then it is contained in the set [0,q]r.
The set [0,q]r is compact, and r0(Q′r)=Q0 is a closed condition, so Q0 is compact. The next section will prove that there is an element Q′r∈Q0 with r1(Q′r)=Q1; that will complete the proof.
Key lemmas
Define L(Q′r)=|r1(Q′r)−Q1|1=∑w1∈W1|r1(Q′r)(w1)−Q1(w1)|. Now L(Q′r)≥0, and note that L(Q′r)=0 is equivalent with r1(Q′r)=Q1.
Thus if L takes the value 0 on Q0, we’ve found the desired Qr. We will show that this happens thanks to the following key lemma:
Lemma 1: If there is a Q′r∈Q0 with L(Q′r)>0, then there exists a Q′′r∈Q0 with L(Q′′r)<L(Q′r).
Now, since Q0 is compact and L is continuous, it will attain its minimum μ on Q0. Then lemma 1 shows that μ=0 (otherwise it wouldn’t be a minimum).
Proof of Lemma 1:
Fix a Q′r with L(Q′r)>0. Now r1(Q′r)(W1)=∑(w0,w1)Q′r(w0,w1)=r0(Q′r)(W0)=Q0(W0)=Q1(W1). So, since L(Q′r)>0, there must exist a w1 with r1(Q′r)(w1)>Q0(w1).
By lemma 2 (see below), we’ll show that there exists a path ρn=w01w10w11w20…wn0wn1 with the following properties:
w01=w1,
(wi0wi1) and (wi+10wi1) are both elements of r,
the Q′r(wi+10wi1) are all greater than 0,
wn1 is such that r1(Q′r(wn1))<Q1(wn1).
Then define ϵ>0 to be the minimum of {r1(Q′r)(w1)−Q1(w1),Q′r(wi0wi1),Q1(w1)−r1(Q′r)(wn1)}.
We’ll then define Q′′r as Q′′r(wi0wi1)=Q′r(wi+10wi1)−ϵ (which is greater than 0 by the definition of ϵ), Q′′r(wi0wi1)=Q′r(wi0wi1)+ϵ, and Q′′r=Qr otherwise.
Then notice that, apart from w1=w01 and wn1, r1(Q′′r)(wi0)=∑(wi0,w1)∈rQ′′r(wi0,w1)=r1(Q′′r)(wi0)+ϵ−ϵ=r1(Q′′r)(wi0). So r(Q′r) and r(Q′′r) differ only on w1 and wn1; specifically
r(Q′′r)(w1)=r(Q′r)(w1)−ϵ,
r(Q′′r)(wn1)=r(Q′r)(wn1)+ϵ.
Since r(Q′′r)(w1)≥Q1(w1)+ϵ and r(Q′′r)≤Q1(wn1)−ϵ, we have L(Q′′r)=L(Q′r)−2ϵ. This proves Lemma 1.
Lemma 2: There exists a path ρn=w01w10w11w20…wn0wn1 with the following properties:
w01=w1,
(wi0wi1) and (wi+10wi1) are both elements of r,
the Q′r(wi+10wi1) are all greater than 0,
wn1 is such that r1(Q′r(wn1))<Q1(wn1).
Proof of Lemma 2:
Let W1⊂W1 be the set of all elements of W1 that can be reached by paths ρn (ie are wn1) that obey the first three properties above. Let W0⊂W0 be the set of all elements of W1 that are wn0 for some path ρn that obey the first three properties above. Then clearly W1=r(W0), by the second condition above (note that the third condition doesn’t affect (wn0wn1), which is only required to be in r).
Since r is a morphism, Q0(W0)≤Q1(W1).
Note that if Q′r(w0,w′1)>0 with w′1∈W1, then w0 must be in W0; this is because we could add w0w′1 as wn+10wn+11 to any path ρn that reaches w′1, getting a slightly longer path that goes via w0 and thus puts it in W0.
The underlying model of a morphism
I’ve already talked about generalised models. The aim is not only to have a universal system for modelling any agent’s mental model—universality is pretty easy to get—but a system where it’s easy to recreate these mental models. And then analyse the transition between models.
This post will show that if there is a morphism r between two models (say, between ideal gas laws and models of atoms bouncing around), then there is an underlying model for that morphism.
Specifically, if r is a morphism between M0=(F0,Q0) and M1=(F1,Q1), then there is a generalised model Mr defined from r. The features of this model are the combination of the features of the two models: F0⊔F1, and there are natural morphisms r0 and r1 from this underlying model to M0 and M1:
Now, if W0 and W1 are the sets of possible worlds for M0 and M1, then W0×W1 is the set of possible worlds for Mr. Then since r is a relation between W0 and W1, it can be seen as subset of W0×W1. And the Qr is a probability distribution over this subset r.
What this means is that Qr measures how probability ‘flows’ from worlds in M0 to worlds in M1. If (w0,w1) is an element of r, then Qr(w0,w1) measures how much probability is flowing from w0 to w1. The actual probability of w0 is the sum of all probability flowing out of it; that of w1, the sum of the probability flowing into it.
See for example this diagram, where the Q0 probabilities are indicated in blue, those of Qr in black, and those of Q1 in red. The probabilities Q0 and Q1 are the sum of the relevant probabilities Qr on the “edges” connecting to those points:
The distribution Qr is non-unique, though. The following two examples show situations with the same Q0 and Q1, but different Qr:
The rest of this post will be dedicated to prove the existence of the underlying model for the morphism r; it can be skipped if you aren’t interested.
Proof of underlying model
Definitions
Previous posts on generalised models defined them as triplets M=(F,E,Q), with F a set of features, W=2¯¯¯¯F the set of possible worlds for those features, E⊂W a subset of environments, and Q a probability distribution on E.
But E was mainly superfluous, as Q can be extended to a probability distribution on all of W just by setting it to be zero on W−E. Thus E was dropped from the definition.
The original definition allowed Q to be a partial probability distribution, but here we’ll assume it’s a total probability distribution (though not necessarily normalised; Q(W) need not be 1). The sets of features are assumed to be finite.
Then a morphism r between generalised models M0=(F0,Q0) and M1=(F1,Q1) is a binary relation between W0 and W1, such that:
Q0(E0)≤Q1(r(E0)),
Q1(E1)≤Q0(r−1(E1)).
We might extend the class of morphisms by defining relations that only obey the first inequality as “left-morphisms”, and relations that only obey the second one as a “right-morphisms”. Left-morphisms ensure probability isn’t lost (Q1(W1)≥Q0(W0)), right morphisms ensure probability isn’t gained (Q0(W0)≥Q1(W1)). Full morphisms, of course, ensure that probability isn’t gained or lost (Q0(W0)=Q1(W1)).
Binary relations are not necessarily functions; functions are relations r such that each w0 in W0 is related to exactly one w1 in W1.
Statement of the theorem
Let r be a morphism between M0=(F0,Q0) and M1=(F1,Q1). Then there exists a generalised model Mr=(F0⊔F1,Qr), with natural function morphisms r0:Mr→M0 and r1:Mr→M1.
The Qr is non-zero on a set contained in r⊂W0×W1=2F0×2F1=2F0⊔F1. The Qr need not be uniquely defined, but the total measure of Qr is the same as Q0 and Q1:
Qr(r)=Qr(W0×W1)=Q0(W0)=Q1(W1).
Main proof
The function r0 is just projection onto the first component: it sends (w0,w1) to w0. The functions r1 conversely send (w0,w1) to w1.
Because r0 and r1 are functions, they can ‘push-forward’ any probability distribution Q′r on W0×W1 to W0 and W1, respectively. This is given by: r0(Q′r)(w0)=∑w1Qr(w0,w1), and similarly for r1(Q′r).
We aim to construct a Q′r such that r0(Q′r)=Q0 and r1(Q′r)=Q1; this will be our Qr, and will make r0 and r1 into morphisms.
Define Q′r(w0,w1) to be zero if (w0,w1)∉r, or Q0(w0)=0 or Q1(w1)=0. Thus we will ignore any elements of W0 and W1 of measure zero, and any element of W0×W1 that is not in r.
Let w0∈W0 be such that it is not related to any elements of w1 by r. Then Q0(w0)≤Q1(r(w0))=Q1(∅)=0. Thus any element of W0 with non-zero measure is related to some w1 via r.
Then define a choice function c that maps every element w0 with Q0(w0)>0, to an element w1 that it is related to by r. And define Q′r(w0,c(w0))=Q0(w0), and Q′r is zero on all other elements of W0×W1.
Then r0(Q′r)(w0)=∑(w0,w1)Q′r(w0,w1)=Q′r(w0,c(w0))=Q0(w0). Hence r0(Q′r)=Q0. Consequently, Q′r(W0×W1)=Q0(W0).
Define Q0 as the set of Q′r, probability distributions on r with r0(Q′r)=Q0. We’ve shown that Q0 is non-empty; moreover, any Q′r∈Q0 has a total measure equal to Q0(W0)=q. Since Q′r is defined on r, then it is contained in the set [0,q]r.
The set [0,q]r is compact, and r0(Q′r)=Q0 is a closed condition, so Q0 is compact. The next section will prove that there is an element Q′r∈Q0 with r1(Q′r)=Q1; that will complete the proof.
Key lemmas
Define L(Q′r)=|r1(Q′r)−Q1|1=∑w1∈W1|r1(Q′r)(w1)−Q1(w1)|. Now L(Q′r)≥0, and note that L(Q′r)=0 is equivalent with r1(Q′r)=Q1.
Thus if L takes the value 0 on Q0, we’ve found the desired Qr. We will show that this happens thanks to the following key lemma:
Lemma 1: If there is a Q′r∈Q0 with L(Q′r)>0, then there exists a Q′′r∈Q0 with L(Q′′r)<L(Q′r).
Now, since Q0 is compact and L is continuous, it will attain its minimum μ on Q0. Then lemma 1 shows that μ=0 (otherwise it wouldn’t be a minimum).
Proof of Lemma 1:
Fix a Q′r with L(Q′r)>0. Now r1(Q′r)(W1)=∑(w0,w1)Q′r(w0,w1)=r0(Q′r)(W0)=Q0(W0)=Q1(W1). So, since L(Q′r)>0, there must exist a w1 with r1(Q′r)(w1)>Q0(w1).
By lemma 2 (see below), we’ll show that there exists a path ρn=w01w10w11w20…wn0wn1 with the following properties:
w01=w1,
(wi0wi1) and (wi+10wi1) are both elements of r,
the Q′r(wi+10wi1) are all greater than 0,
wn1 is such that r1(Q′r(wn1))<Q1(wn1).
Then define ϵ>0 to be the minimum of {r1(Q′r)(w1)−Q1(w1), Q′r(wi0wi1), Q1(w1)−r1(Q′r)(wn1)}.
We’ll then define Q′′r as Q′′r(wi0wi1)=Q′r(wi+10wi1)−ϵ (which is greater than 0 by the definition of ϵ), Q′′r(wi0wi1)=Q′r(wi0wi1)+ϵ, and Q′′r=Qr otherwise.
Then notice that, apart from w1=w01 and wn1, r1(Q′′r)(wi0)= ∑(wi0,w1)∈rQ′′r(wi0,w1)= r1(Q′′r)(wi0)+ϵ−ϵ=r1(Q′′r)(wi0). So r(Q′r) and r(Q′′r) differ only on w1 and wn1; specifically
r(Q′′r)(w1)=r(Q′r)(w1)−ϵ,
r(Q′′r)(wn1)=r(Q′r)(wn1)+ϵ.
Since r(Q′′r)(w1)≥Q1(w1)+ϵ and r(Q′′r)≤Q1(wn1)−ϵ, we have L(Q′′r)=L(Q′r)−2ϵ. This proves Lemma 1.
Lemma 2: There exists a path ρn=w01w10w11w20…wn0wn1 with the following properties:
w01=w1,
(wi0wi1) and (wi+10wi1) are both elements of r,
the Q′r(wi+10wi1) are all greater than 0,
wn1 is such that r1(Q′r(wn1))<Q1(wn1).
Proof of Lemma 2:
Let W1⊂W1 be the set of all elements of W1 that can be reached by paths ρn (ie are wn1) that obey the first three properties above. Let W0⊂W0 be the set of all elements of W1 that are wn0 for some path ρn that obey the first three properties above. Then clearly W1=r(W0), by the second condition above (note that the third condition doesn’t affect (wn0wn1), which is only required to be in r).
Since r is a morphism, Q0(W0)≤Q1(W1).
Note that if Q′r(w0,w′1)>0 with w′1∈W1, then w0 must be in W0; this is because we could add w0w′1 as wn+10wn+11 to any path ρn that reaches w′1, getting a slightly longer path that goes via w0 and thus puts it in W0.
Consequently, r1(Q′r)(W1)= ∑(w′0,w′1)∈r,w′1∈W1Q′r(w′1)= ∑(w′0,w′1)∈r,w′0∈W0Q′r(w′0)= Q0(W0).
So r1(Q′r)(W1)=Q0(W0)≤Q1(W1). Since W1 includes w1 with r1(Q′r)(w1)>Q1(w1), it also much include at least one w′′1 with r1(Q′r)(w′′1)<Q1(w′′1).
The path ρn that reaches this w′′1 will then satisfy the fourth condition of the lemma, proving it.