I’m reviewing the books on the MIRI course list. After my firstfourbookreviews I took a week off, followed up on some dangling questions, and upkept other side projects. Then I dove into Model Theory, by Chang and Keisler.
It has been three weeks. I have gained a decent foundation in model theory (by my own assessment), but I have not come close to completing the textbook. There are a number of other topics I want to touch upon before December, so I’m putting Model Theory aside for now. I’ll be revisiting it in either January or March to finish the job.
In the meantime, I do not have a complete book review for you. Instead, this is the first of three posts on my experience with model theory thus far.
This post will give you some framing and context for model theory. I had to hop a number of conceptual hurdles before model theory started making sense — this post will contain some pointers that I wish I’d had three weeks ago. These tips and realizations are somewhat general to learning any logic or math; hopefully some of you will find them useful.
Shortly, I’ll post a summary of what I’ve learned so far. For the casual reader, this may help demystify some heavily advanced parts of the Heavily Advanced Epistemology sequence (if you find it mysterious), and it may shed some light on some of the recent MIRI papers. On a personal note, there’s a lot I want to write down & solidify before moving on.
In follow-up post, I’ll discuss my experience struggling to learn something difficult on my own — model theory has required significantly more cognitive effort than did the previous textbooks.
Between what was meant and what was said
Model theory is an abstract branch of mathematical logic, which itself is already too abstract for most. So allow me to motivate model theory a bit.
At its core, model theory is the study of what you said, as opposed to what you meant. To give some intuition for this, I’ll re-tell an overtold story about an ancient branch of math.
In olden times, Euclid built Geometry upon five axioms:
You can draw a straight line segment between two points.
You can extend line segments into infinitely straight lines.
You can draw a circle from a straight line segment, with the center at one end and radius the line segment.
All right angles are congruent.
If two lines are drawn which intersect a third in such a way that the sum of the inner angles on one side is less than two right angles, then the two lines inevitably must intersect each other on that side if extended far enough.
One of these things is not like the other. The fifth axiom is the only one which requires some effort to understand. Intuitively, it states that parallel lines do not intersect. This statement irked Euclid for reasons apart from the ugliness of the axiom.
The fact that parallel lines do not intersect seems like it should follow from the definition of lines and angles. It doesn’t seem like something we should have to specify in addition. That we must assume parallel lines do not intersect (rather than proving it) was long seen as a wart on geometry.
This wart irked mathematicians for millennia, until finally it was discovered that the fifth axiom is independent of the other four. You can build consistent systems where parallel lines intersect. You can build consistent systems where they diverge.
This seemed crazy, at the time: parallel straight lines cannot diverge! Surely, a geometry in which they do is absurd!
The problem is that mathematicians were imagining “straight lines” in their head that did not match the mathematical objects specified by the first four axioms of Euclid.
This mistake was invited by names which Euclid chose. “Straight lines” invoke a mental image that is more specific than that which the axioms describe. If you detach the provocative words from the axioms
You can make a LUME between any two PTARS
You can extend a LUME into a SLUME
…
and so on, then it’s much easier to understand that the LUMEs which Euclid’s axioms describe may not match up with the image of a “straight line” in your head. It is much easier to understand that there may be interpretations of LUME which do not obey the fifth postulate.
In fact, if you take Euclid’s first four postulates, there are many possible interpretations in which “straight line” takes on a multitude of meanings. This ability to disconnect the intended interpretation from the available interpretations is the bedrock of model theory. Model theory is the study of all interpretations of a theory, not just the ones that the original author intended.
Of course, model theory isn’t really about finding surprising new interpretations — it’s much more general than that. It’s about exploring the breadth of interpretations that a given theory makes available. It’s about discerning properties that hold in all possible interpretations of a theory. It’s about discovering how well (or poorly) a given theory constrains its interpretations. It’s a toolset used to discuss interpretations in general.
At its core, model theory is the study of what a mathematical theory actually says, when you strip the intent from the symbols.
Iron walls
Before you can do model theory, you have to erect iron walls between four different concepts.
Logics
Languages
Theories
Models
Logics
A logic is a formal system for building and manipulating sentences. Traditionally, this logic defines a number of symbols (( ) ∧ ¬ ∀ ∃ ≡ ν ', for example) and rules for building sentences from those symbols.
Note that you cannot generate sentences from a logic alone. Rather, you use a logic to generate sentences from a language.
Also, remember that the rules of a logic are syntactic, such as “if φ is a sentence then (¬φ) is a sentence”.
Finally, remember that logics are just rules for generating sentences. A logic is perfectly happy to generate sentences shaped like x∧(¬x), in spite of all your protests about contradictions.
Languages
A language is a collection of symbols. From those symbols, using a logic, you can start generating sentences.
For example, in the propositional logic, using the language {x, y}, the string hello is surely not a sentence (for it fails to use the appropriate symbols). Nor is the string ¬xy a sentence: it fails to follow the rules of the logic.((¬x)∧y) is a sentence, for it uses the appropriate symbols and follows the given rules.
Many results in model theory are achieved by holding the logic fixed and varying the language, so it’s essential that these concepts be distinct in your mind.
Theories
A theory is a collection of sentences written in one language. For example, in the language {≤} under first-order logic, we can discuss the theory
(∀x)(x≤x)
(∀xy)(x≤y)∧(y≤x)→(y≡x)
(∀xyz)(x≤y)∧(y≤z)→(x≤z)
which is the theory of order. (The axioms above are reflexivity, antisymmetry, and transitivity).
Remember that a theory is just a set of sentences drawn from all available sentences. These sentences aren’t particularly special unless you make them special. Sentences like (∃x)¬(x≤x) are fine sentences built from the language {≤}, even though they directly contradict the theory. Theories don’t affect the sentences of a language — they’re just a grab-bag of some sentences that seemed interesting to someone.
Models
A model is an interpretation of the sentences generated by a language. A model is a structure which assigns a truth value to each sentence generated by some language under some logic.
(More specifically, it’s a structure that assigns binary values to sentences in such a way that we’re justified in the name “truth value”: for example, we require that a model says φ is true if and only if it says that ¬φ is false, and so on.)
Only once we start interpreting sentences is it meaningful to talk about valid or refutable sentences. Once you have a model of {≤} that happens to say that the axioms 1, 2, and 3 above are true, then you can start talking about how the theory of order rules out the sentence (∃x)¬(x≤x) — because there is no model of the theory of order which is also a model of this sentence.
(You can actually talk about how (∃x)¬(x≤x) is inconsistent with the theory of order without appealing to model theory, but I find it helpful to treat everything as raw symbols until interpreted by a model.)
To give a concrete example, in first order logic, using the language {S, +, *, 0}, the theory of arithmetic is the theory laid out by the [Peano axioms](http://en.wikipedia.org/wiki/Peano_axioms#First-order_theory_of_arithmetic). The actual natural numbers zero, one, two, … are a model of this theory (where zero is the interpretation of 0, one is the interpretation of S0, etc.).
Also, it’s worth noting that any object that interprets sentences and follows the rules of the logic qualifies as a model. There are often many non-isomorphic objects that interpret the same sentences in the same way. For example, rational numbers and real numbers are models of group theory that agree on every sentence in the language of groups, despite being different models.
Distinctions between these four points is something that seems obvious to me in hindsight, but I explicitly remember expending cognitive effort to separate these concepts mentally, so there you go. Make sure these distinctions are wrought in iron before attempting model theory.
The Right to use a name
There’s something about math education in general that has troubled me for quite some time, and which I’m finally able to articulate. It’s quite possible that this is a personal nit, since nobody else seems to care — but I’ll share it anyway.
Many math textbooks treat properties that justify a name of a thing as statements about the thing after naming it.
This is a little abstract, so I’ll make a silly example. Imagine someone is trying to show that, in category theory, composition of arrows is associative. They shouldn’t appeal to visual intuition or any diagrams of arrows.
The concept that following arrows is an associative operation is so ingrained in the concept of “arrow” that it’s difficult to describe the property in English without sounding dumb.
If you move from A to B, then move B-to-D-through-C in one step, and if I follow the same paths but move A-to-C-through-B in one step and then from C to D, then we will end up at the same place.
This property of arrows is so stupidly obvious that the statement is frustrating. Further, it hides the following fact:
Associative composition between thingies is something we must have before we’re justified in calling the thingies “Arrows”.
Associative composition is what allows you to use the name “arrow” and draw visual diagrams. You can’t appeal to my intuition about “arrows” to show that composition is associative. It’s the other way around! Only after you show that your thingies have associative composition are you allowed to label them as “arrows”.
As another example, the axioms of order (above) are what allow us to use the ≤ symbol, which appeals to our intuitive idea of order. Really, it’s more honest to say “We have a binary relation R, satisfying
(∀x)R(xx)
(∀xy)R(xy)∧R(yx)→(y≡x)
(∀xyz)R(xy)∧R(yz)→R(xz)
which justifies our use of the ≤ symbol for R.”
I imagine this is not a problem for experienced mathematicians, for whom it goes without saying that you must formally specify (or disregard) all intuitive baggage that comes attached to the names. However, I remember distinctly a number of times when I gnashed my teeth with boredom as teachers made obvious statements (of course≤ is reflexive, why do we even need to say this?), simply because I didn’t understand this idea.
I mention this because the first few sections of the Model Theory textbook make statements that seem quite obvious. It’s easy to grind your teeth and say “duh, hurry up”. It’s a little harder to understand exactly why such things must be said. In that light, I think this is a good piece of advice for learning mathematics in general:
If you find yourself wondering why a statement must be said, check whether the statement is justifying any names.
Binding meaning
The early parts of Model Theory will go down much easier if you realize that they’re binding logical symbols to the appropriate meaning (and thus justifying the name “model”).
For example, when we state “M models φ∧ψ if and only if it models φ and it models ψ”, it’s easy to say “well duh”. It’s a little harder to understand that this is the mechanism by which the symbol∧is bound to the interpretation “and”.
Also, note that the ability to distinguish between “the symbol + in the language L” from “the addition function as interpreted by the model M” is absolutely crucial.
Totality
Something that kept on biting me was this: Models of first-order logic are “total”. They have something to say about every sentence in a language. Even where a theory is incomplete, any individual model is “complete”. A model of first-order logic interprets function symbols by total functions and relations by set-theoretic relations. The relationship ⊧ is total: for every sentence, either M⊧φ or M⊧¬φ.
This is a point where my intuitive notion of “models as interpretations” departed from the actual mathematical objects under consideration — functions are firmly partial-by-default in my mind’s eye.
It’s important to hold firm the distinction between “model” and “theory” here. Remember that the number theory is incomplete, while the standard model of number theory is the one that picks “true” for all Gödel sentences, has no infinite numbers, etc. (The difficulties in pinpointing such a model is exactly what the incompleteness theorem is all about.)
Be aware that the mathematical definition of a model may not match your intuitive idea of “a structure which interprets a theory”, especially if you’re coming from computer science (or other constructive fields).
None of this is particularly novel. Rather, this is a collection of distinctions and clarifications that would have made my life a bit easier when beginning the textbook.
In my case, I didn’t have any of these concepts wrong, per se — rather, I had them fuzzy. The above distinctions were not yet fleshed out in my mind. This post provides a context for model theory; a taste of the type of thinking you must be ready to think.
I was originally going to use this as context for what I’ve learned in model theory so far, but this post took longer than expected. I’ll follow up tomorrow.
Mental Context for Model Theory
I’m reviewing the books on the MIRI course list. After my first four book reviews I took a week off, followed up on some dangling questions, and upkept other side projects. Then I dove into Model Theory, by Chang and Keisler.
It has been three weeks. I have gained a decent foundation in model theory (by my own assessment), but I have not come close to completing the textbook. There are a number of other topics I want to touch upon before December, so I’m putting Model Theory aside for now. I’ll be revisiting it in either January or March to finish the job.
In the meantime, I do not have a complete book review for you. Instead, this is the first of three posts on my experience with model theory thus far.
This post will give you some framing and context for model theory. I had to hop a number of conceptual hurdles before model theory started making sense — this post will contain some pointers that I wish I’d had three weeks ago. These tips and realizations are somewhat general to learning any logic or math; hopefully some of you will find them useful.
Shortly, I’ll post a summary of what I’ve learned so far. For the casual reader, this may help demystify some heavily advanced parts of the Heavily Advanced Epistemology sequence (if you find it mysterious), and it may shed some light on some of the recent MIRI papers. On a personal note, there’s a lot I want to write down & solidify before moving on.
In follow-up post, I’ll discuss my experience struggling to learn something difficult on my own — model theory has required significantly more cognitive effort than did the previous textbooks.
Between what was meant and what was said
Model theory is an abstract branch of mathematical logic, which itself is already too abstract for most. So allow me to motivate model theory a bit.
At its core, model theory is the study of what you said, as opposed to what you meant. To give some intuition for this, I’ll re-tell an overtold story about an ancient branch of math.
In olden times, Euclid built Geometry upon five axioms:
You can draw a straight line segment between two points.
You can extend line segments into infinitely straight lines.
You can draw a circle from a straight line segment, with the center at one end and radius the line segment.
All right angles are congruent.
If two lines are drawn which intersect a third in such a way that the sum of the inner angles on one side is less than two right angles, then the two lines inevitably must intersect each other on that side if extended far enough.
One of these things is not like the other. The fifth axiom is the only one which requires some effort to understand. Intuitively, it states that parallel lines do not intersect. This statement irked Euclid for reasons apart from the ugliness of the axiom.
The fact that parallel lines do not intersect seems like it should follow from the definition of lines and angles. It doesn’t seem like something we should have to specify in addition. That we must assume parallel lines do not intersect (rather than proving it) was long seen as a wart on geometry.
This wart irked mathematicians for millennia, until finally it was discovered that the fifth axiom is independent of the other four. You can build consistent systems where parallel lines intersect. You can build consistent systems where they diverge.
This seemed crazy, at the time: parallel straight lines cannot diverge! Surely, a geometry in which they do is absurd!
The problem is that mathematicians were imagining “straight lines” in their head that did not match the mathematical objects specified by the first four axioms of Euclid.
This mistake was invited by names which Euclid chose. “Straight lines” invoke a mental image that is more specific than that which the axioms describe. If you detach the provocative words from the axioms
You can make a
LUME
between any twoPTARS
You can extend a
LUME
into aSLUME
…
and so on, then it’s much easier to understand that the
LUME
s which Euclid’s axioms describe may not match up with the image of a “straight line” in your head. It is much easier to understand that there may be interpretations ofLUME
which do not obey the fifth postulate.In fact, if you take Euclid’s first four postulates, there are many possible interpretations in which “straight line” takes on a multitude of meanings. This ability to disconnect the intended interpretation from the available interpretations is the bedrock of model theory. Model theory is the study of all interpretations of a theory, not just the ones that the original author intended.
Of course, model theory isn’t really about finding surprising new interpretations — it’s much more general than that. It’s about exploring the breadth of interpretations that a given theory makes available. It’s about discerning properties that hold in all possible interpretations of a theory. It’s about discovering how well (or poorly) a given theory constrains its interpretations. It’s a toolset used to discuss interpretations in general.
At its core, model theory is the study of what a mathematical theory actually says, when you strip the intent from the symbols.
Iron walls
Before you can do model theory, you have to erect iron walls between four different concepts.
Logics
Languages
Theories
Models
Logics
A logic is a formal system for building and manipulating sentences. Traditionally, this logic defines a number of symbols (
( ) ∧ ¬ ∀ ∃ ≡ ν '
, for example) and rules for building sentences from those symbols.Note that you cannot generate sentences from a logic alone. Rather, you use a logic to generate sentences from a language.
Also, remember that the rules of a logic are syntactic, such as “if
φ
is a sentence then(¬φ)
is a sentence”.Finally, remember that logics are just rules for generating sentences. A logic is perfectly happy to generate sentences shaped like
x∧(¬x)
, in spite of all your protests about contradictions.Languages
A language is a collection of symbols. From those symbols, using a logic, you can start generating sentences.
For example, in the propositional logic, using the language
{x, y}
, the stringhello
is surely not a sentence (for it fails to use the appropriate symbols). Nor is the string¬xy
a sentence: it fails to follow the rules of the logic.((¬x)∧y)
is a sentence, for it uses the appropriate symbols and follows the given rules.Many results in model theory are achieved by holding the logic fixed and varying the language, so it’s essential that these concepts be distinct in your mind.
Theories
A theory is a collection of sentences written in one language. For example, in the language
{≤}
under first-order logic, we can discuss the theory(∀x)(x≤x)
(∀xy)(x≤y)∧(y≤x)→(y≡x)
(∀xyz)(x≤y)∧(y≤z)→(x≤z)
which is the theory of order. (The axioms above are reflexivity, antisymmetry, and transitivity).
Remember that a theory is just a set of sentences drawn from all available sentences. These sentences aren’t particularly special unless you make them special. Sentences like
(∃x)¬(x≤x)
are fine sentences built from the language{≤}
, even though they directly contradict the theory. Theories don’t affect the sentences of a language — they’re just a grab-bag of some sentences that seemed interesting to someone.Models
A model is an interpretation of the sentences generated by a language. A model is a structure which assigns a truth value to each sentence generated by some language under some logic.
(More specifically, it’s a structure that assigns binary values to sentences in such a way that we’re justified in the name “truth value”: for example, we require that a model says φ is true if and only if it says that ¬φ is false, and so on.)
Only once we start interpreting sentences is it meaningful to talk about valid or refutable sentences. Once you have a model of
{≤}
that happens to say that the axioms 1, 2, and 3 above are true, then you can start talking about how the theory of order rules out the sentence(∃x)¬(x≤x)
— because there is no model of the theory of order which is also a model of this sentence.(You can actually talk about how
(∃x)¬(x≤x)
is inconsistent with the theory of order without appealing to model theory, but I find it helpful to treat everything as raw symbols until interpreted by a model.)To give a concrete example, in first order logic, using the language {S, +, *, 0}, the theory of arithmetic is the theory laid out by the [Peano axioms](http://en.wikipedia.org/wiki/Peano_axioms#First-order_theory_of_arithmetic). The actual natural numbers zero, one, two, … are a model of this theory (where zero is the interpretation of 0, one is the interpretation of S0, etc.).
Also, it’s worth noting that any object that interprets sentences and follows the rules of the logic qualifies as a model. There are often many non-isomorphic objects that interpret the same sentences in the same way. For example, rational numbers and real numbers are models of group theory that agree on every sentence in the language of groups, despite being different models.
Distinctions between these four points is something that seems obvious to me in hindsight, but I explicitly remember expending cognitive effort to separate these concepts mentally, so there you go. Make sure these distinctions are wrought in iron before attempting model theory.
The Right to use a name
There’s something about math education in general that has troubled me for quite some time, and which I’m finally able to articulate. It’s quite possible that this is a personal nit, since nobody else seems to care — but I’ll share it anyway.
Many math textbooks treat properties that justify a name of a thing as statements about the thing after naming it.
This is a little abstract, so I’ll make a silly example. Imagine someone is trying to show that, in category theory, composition of arrows is associative. They shouldn’t appeal to visual intuition or any diagrams of arrows.
The concept that following arrows is an associative operation is so ingrained in the concept of “arrow” that it’s difficult to describe the property in English without sounding dumb.
This property of arrows is so stupidly obvious that the statement is frustrating. Further, it hides the following fact:
Associative composition between thingies is something we must have before we’re justified in calling the thingies “Arrows”.
Associative composition is what allows you to use the name “arrow” and draw visual diagrams. You can’t appeal to my intuition about “arrows” to show that composition is associative. It’s the other way around! Only after you show that your thingies have associative composition are you allowed to label them as “arrows”.
As another example, the axioms of order (above) are what allow us to use the
≤
symbol, which appeals to our intuitive idea of order. Really, it’s more honest to say “We have a binary relationR
, satisfying(∀x)R(xx)
(∀xy)R(xy)∧R(yx)→(y≡x)
(∀xyz)R(xy)∧R(yz)→R(xz)
which justifies our use of the
≤
symbol forR
.”I imagine this is not a problem for experienced mathematicians, for whom it goes without saying that you must formally specify (or disregard) all intuitive baggage that comes attached to the names. However, I remember distinctly a number of times when I gnashed my teeth with boredom as teachers made obvious statements (of course
≤
is reflexive, why do we even need to say this?), simply because I didn’t understand this idea.I mention this because the first few sections of the Model Theory textbook make statements that seem quite obvious. It’s easy to grind your teeth and say “duh, hurry up”. It’s a little harder to understand exactly why such things must be said. In that light, I think this is a good piece of advice for learning mathematics in general:
If you find yourself wondering why a statement must be said, check whether the statement is justifying any names.
Binding meaning
The early parts of Model Theory will go down much easier if you realize that they’re binding logical symbols to the appropriate meaning (and thus justifying the name “model”).
For example, when we state “M models
φ∧ψ
if and only if it modelsφ
and it modelsψ
”, it’s easy to say “well duh”. It’s a little harder to understand that this is the mechanism by which the symbol∧
is bound to the interpretation “and”.Also, note that the ability to distinguish between “the symbol
+
in the language L” from “the addition function as interpreted by the model M” is absolutely crucial.Totality
Something that kept on biting me was this: Models of first-order logic are “total”. They have something to say about every sentence in a language. Even where a theory is incomplete, any individual model is “complete”. A model of first-order logic interprets function symbols by total functions and relations by set-theoretic relations. The relationship
⊧
is total: for every sentence, eitherM⊧φ
orM⊧¬φ
.This is a point where my intuitive notion of “models as interpretations” departed from the actual mathematical objects under consideration — functions are firmly partial-by-default in my mind’s eye.
It’s important to hold firm the distinction between “model” and “theory” here. Remember that the number theory is incomplete, while the standard model of number theory is the one that picks “true” for all Gödel sentences, has no infinite numbers, etc. (The difficulties in pinpointing such a model is exactly what the incompleteness theorem is all about.)
Be aware that the mathematical definition of a model may not match your intuitive idea of “a structure which interprets a theory”, especially if you’re coming from computer science (or other constructive fields).
None of this is particularly novel. Rather, this is a collection of distinctions and clarifications that would have made my life a bit easier when beginning the textbook.
In my case, I didn’t have any of these concepts wrong, per se — rather, I had them fuzzy. The above distinctions were not yet fleshed out in my mind. This post provides a context for model theory; a taste of the type of thinking you must be ready to think.
I was originally going to use this as context for what I’ve learned in model theory so far, but this post took longer than expected. I’ll follow up tomorrow.