(These are the touched up notes from a class I took with CMU’s Kevin Kelly this past semester on the Topology of Learning. Only partially optimized for legibility)
Possible World Semantics
Let W be the set of all possible worlds. The nature of your inquiry is going to shape what sorts of worlds are in W .
Now we consider some true or false proposition A concerning W. A could be “There are more than 15 people in North America”. The key idea in possible world semantics is that every proposition A is represented by the set of all worlds where A is true.
Def: Proposition A={w∈W:A is true in w}
We’ll still refer to propositions by their English sentence description, but when we start doing math with them you should think of them as sets of worlds. Here are some consequences of defining logical propositions as such:
A∨B=A∪B
A∧B=A∩B
¬A=Ac
A→B=A⊆B
⊤=W
⊥=∅
Convince yourself that the above are true. If you’re wondering, “Hmmm, the logic of set containment and classical logic seem eerily similar” you’re right, and might want to ponder if that has anything to do with how we decided what the rules of classical logic should be in the first place.
Information Basis
Now that we have our world, the next thing we want is our information basisI. I is made up of information states, and each information state is a proposition. This means that they follow all the same union and intersection rules that other propositions do, and that an info state is a set of possible worlds. However, your info basis can’t just be any old set of proposition. We are trying to capture the set of propositions that you could know about the world. Upfront, I want to acknowledge that this “could” might cause some confusion. How do we know what you could or couldn’t know? Isn’t that what we’re trying to figure out? For now, we will resolve that with the following distinction. When we’re talking about possible worlds and information states, they are not defined by some requirement that a particular person could access them. Later when we talk about methods, then we’ll be talking about, “What conclusions could someone with XYZ method reach?”
The information basis, just like the possible worlds, will be shaped by how we construct the setup to our inquiry. Normally, what the information basis looks like will be a direct result of what “measurement” tools are being used to do inquiry. Separate from what the basis looks like in any given construction, below are some basic axioms that we always have our info basis abide by.
∀w∈W:I(w)≠∅
∀w∈W:∀A,B∈I(w):∃C∈I(w):C=A∩B
I is countable
The first axiom just states, “No matter what world you’re in, there’s something you can know, even if it’s just the tautology”. This is mostly a bookkeeping axiom and doesn’t have profound philosophical consequences.
The third axiom has some intuitive appeal (it’s hard to imagine finite beings interacting with uncountable entities) but again is mostly a bookkeeping axiom to make some proofs slicker down the road.
The second axiom is the interesting one. It plain English it says “For any two info states you could witness, there is another info state that includes the information of both.” This can be thought of “additivity of evidence”. If there are two propositions A and B, two possible things you could know, then it is “possible” to know both of them. There won’t be a weird branching of your experiments where if you see A you’ll never be able to see B.
Okay, that’s a bulk of the setup to the relevant syntax, and it probably wasn’t very insightful or meaningful to you. Let’s hop into some examples and see what it looks like to model problems in this framework.
Example: Hume’s Black Box
I’m going to model a simple inductive problem. Let’s say everyday you wake up and check whether or not aliens have made contact with earth. Everyday you put a “0” up on the wall if they haven’t, and a “1″ if they have.
In this setup, a possible world is any given infinite sequence of 1′s and 0′s. Something like:
wmine={00000000000000000000000000...}
This makes the set of all possible worlds the set of all possible infinite binary strings.
W=2ω
(notation explanation: XY is common notation for “all function from Y into X”. So 2ω is “all function from the naturals to 2 (which in many set theory construction, 2 is defined to be the set {0,1})”. A function from the naturals to the set {0,1} defines an infinite binary string)
Onto our info states. Since we are observing this infinite sequence day by day, when can only ever have seen a finite amount of it. So we probably want an info state to be something like e∈2<ω where 2<ω is “all finite binary strings”. But remember, an info state has to be a proposition, and a proposition is a set of possible worlds. No world is represented by a finite binary string. So we do the following:
[e]={w∈2ω:e∈w}
I={[e]:e∈2<ω}
And boom, we’ve got our info basis.
Here’s what this looks like as a picture:
The circles represent information states. There is an information state that confirms “Bread fails to nourish at time t = 3”, and there are information states like “either bread nourishes or it fails to nourish at t > n”, but there is no information state that uniquely picks out the world where bread nourishes, which is why this is an induction scenario.
Example: Function Learning
There exists some function on the real numbers, and we are trying to figure out what sort of function it is. We get to investigate the function by getting arbitrarily small rectangle measurements of it.
W={f:R→R}
[(x1,x2),(y1,y2)]={f∈RR:∃x st x1≤x≤x2∧y1≤f(x)≤y2}
I={[(x1,x2),(y1,y2)]:x1<x2∧y1<y2}
The motivation for the rectangular measurement is to account for measurement error. Imagine there is some natural law, and investigating the function is us setting one variable and seeing how another variable changes. There’s there small uncertainty, we never actually check the function at a point and get info like “f(15.4) = −37”. You put in an approximate input and get an approximate answer. You can refine the approximation as much as you want and get the error smaller and smaller, but there is never zero error.
ToL: Foundations
(These are the touched up notes from a class I took with CMU’s Kevin Kelly this past semester on the Topology of Learning. Only partially optimized for legibility)
Possible World Semantics
Let W be the set of all possible worlds. The nature of your inquiry is going to shape what sorts of worlds are in W .
Now we consider some true or false proposition A concerning W. A could be “There are more than 15 people in North America”. The key idea in possible world semantics is that every proposition A is represented by the set of all worlds where A is true.
Def: Proposition A={w∈W:A is true in w}
We’ll still refer to propositions by their English sentence description, but when we start doing math with them you should think of them as sets of worlds. Here are some consequences of defining logical propositions as such:
A∨B=A∪B
Convince yourself that the above are true. If you’re wondering, “Hmmm, the logic of set containment and classical logic seem eerily similar” you’re right, and might want to ponder if that has anything to do with how we decided what the rules of classical logic should be in the first place.
Information Basis
Now that we have our world, the next thing we want is our information basis I. I is made up of information states, and each information state is a proposition. This means that they follow all the same union and intersection rules that other propositions do, and that an info state is a set of possible worlds. However, your info basis can’t just be any old set of proposition. We are trying to capture the set of propositions that you could know about the world. Upfront, I want to acknowledge that this “could” might cause some confusion. How do we know what you could or couldn’t know? Isn’t that what we’re trying to figure out? For now, we will resolve that with the following distinction. When we’re talking about possible worlds and information states, they are not defined by some requirement that a particular person could access them. Later when we talk about methods, then we’ll be talking about, “What conclusions could someone with XYZ method reach?”
The information basis, just like the possible worlds, will be shaped by how we construct the setup to our inquiry. Normally, what the information basis looks like will be a direct result of what “measurement” tools are being used to do inquiry. Separate from what the basis looks like in any given construction, below are some basic axioms that we always have our info basis abide by.
The first axiom just states, “No matter what world you’re in, there’s something you can know, even if it’s just the tautology”. This is mostly a bookkeeping axiom and doesn’t have profound philosophical consequences.
The third axiom has some intuitive appeal (it’s hard to imagine finite beings interacting with uncountable entities) but again is mostly a bookkeeping axiom to make some proofs slicker down the road.
The second axiom is the interesting one. It plain English it says “For any two info states you could witness, there is another info state that includes the information of both.” This can be thought of “additivity of evidence”. If there are two propositions A and B, two possible things you could know, then it is “possible” to know both of them. There won’t be a weird branching of your experiments where if you see A you’ll never be able to see B.
Okay, that’s a bulk of the setup to the relevant syntax, and it probably wasn’t very insightful or meaningful to you. Let’s hop into some examples and see what it looks like to model problems in this framework.
Example: Hume’s Black Box
I’m going to model a simple inductive problem. Let’s say everyday you wake up and check whether or not aliens have made contact with earth. Everyday you put a “0” up on the wall if they haven’t, and a “1″ if they have.
In this setup, a possible world is any given infinite sequence of 1′s and 0′s. Something like:
wmine={00000000000000000000000000...}
This makes the set of all possible worlds the set of all possible infinite binary strings.
W=2ω
(notation explanation: XY is common notation for “all function from Y into X”. So 2ω is “all function from the naturals to 2 (which in many set theory construction, 2 is defined to be the set {0,1})”. A function from the naturals to the set {0,1} defines an infinite binary string)
Onto our info states. Since we are observing this infinite sequence day by day, when can only ever have seen a finite amount of it. So we probably want an info state to be something like e∈2<ω where 2<ω is “all finite binary strings”. But remember, an info state has to be a proposition, and a proposition is a set of possible worlds. No world is represented by a finite binary string. So we do the following:
[e]={w∈2ω:e∈w}
I={[e]:e∈2<ω}
And boom, we’ve got our info basis.
Here’s what this looks like as a picture:
The circles represent information states. There is an information state that confirms “Bread fails to nourish at time t = 3”, and there are information states like “either bread nourishes or it fails to nourish at t > n”, but there is no information state that uniquely picks out the world where bread nourishes, which is why this is an induction scenario.
Example: Function Learning
There exists some function on the real numbers, and we are trying to figure out what sort of function it is. We get to investigate the function by getting arbitrarily small rectangle measurements of it.
W={f:R→R}
[(x1,x2),(y1,y2)]={f∈RR:∃x st x1≤x≤x2∧y1≤f(x)≤y2}
I={[(x1,x2),(y1,y2)]:x1<x2∧y1<y2}
The motivation for the rectangular measurement is to account for measurement error. Imagine there is some natural law, and investigating the function is us setting one variable and seeing how another variable changes. There’s there small uncertainty, we never actually check the function at a point and get info like “f(15.4) = −37”. You put in an approximate input and get an approximate answer. You can refine the approximation as much as you want and get the error smaller and smaller, but there is never zero error.