It seems to me like this discussion has gotten too far into either/or “winning”. The prevalance of encodings in mathematics and logic (e.g. encoding the concept of pairing in set theory by defining (a, b) to be the set {{a}, {a, b}}, the double negation encoding of classical logic inside intuitionist logic, et cetera) means that the things we point to as “foundations” such as the ZF axioms for set theory are not actually foundational, in the sense of necessary and supporting. ZF is one possible system which is sufficiently strong to encode the things that we want to do. There are many other systems which are also sufficiently strong, and getting multiple perspectives on a topic by starting from different alternative “foundations” is probably a good thing. You do not have to choose just one!
Another way to look at it is—the things that we call foundations of mathematics are like the Planck length—yes, according to our current best theory, that’s the smallest, most basic element. However, that doesn’t mean it is stable; rather, it’s cutting edge (admittedly, Planck length is more at the cutting edge of physics than foundations of mathematics are at a cutting edge of mathematics). The stable things are more like cubits and inches—central, human-scale “theorems” such that if a new alternative foundation of mathematics contradicted them, we would throw away the foundation rather than the theorem. Some candidate “cubit” or “inch” theorems (really, arguments) might be the infinitude of primes, or the irrationality of the square root of 2.
Friedman and Simpson have a program of “Reverse Mathematics”, studying in what way facts that we normally think of as theorems, like the Heine-Borel theorem, pin down things that we normally think of as logic axioms, such as induction schemas. I really think that Simpson’s paper “Partial Realizations of Hilbert’s Program” is insightful and helpful to this discussion: http://www.math.psu.edu/simpson/papers/hilbert.ps
Lakatos’s “Proofs and Refutations” is also helpful, explaining in what sense a flawed and/or informal argument is helpful—Note that Lakatos calls arguments proofs, even when they’re flawed and/or informal, implicitly suggesting that everyone calls their arguments proofs until the flaws are pointed out. I understand he retreated from that philosophical position later in life, but that pre-retreat philosophical position might still be tenable.
I agree that we should weigh possible foundations against desired results and respect multiple possibilities as you say. However, we need a formalization of this. It might be that 1st order vs 2nd order is not important. I would suggest, however, that the puzzle I presented in the post is important. The proof-theoretic structure of 1st vs 2nd order might not be a big deal. (A learning system which prefers compact first-order theories can learn the desired many-sorted logic.) The structure of reasonable probabilistic beliefs over these two domains, though, is another thing yet! (A learner which prefers compact first-order theories cannot mimic the desired behavior which I described.)
You won’t automatically get the desired behavior by constructing some sort of intuitive learner based on informal principles. So, we need to discuss formalism.
It seems to me like this discussion has gotten too far into either/or “winning”. The prevalance of encodings in mathematics and logic (e.g. encoding the concept of pairing in set theory by defining (a, b) to be the set {{a}, {a, b}}, the double negation encoding of classical logic inside intuitionist logic, et cetera) means that the things we point to as “foundations” such as the ZF axioms for set theory are not actually foundational, in the sense of necessary and supporting. ZF is one possible system which is sufficiently strong to encode the things that we want to do. There are many other systems which are also sufficiently strong, and getting multiple perspectives on a topic by starting from different alternative “foundations” is probably a good thing. You do not have to choose just one!
Another way to look at it is—the things that we call foundations of mathematics are like the Planck length—yes, according to our current best theory, that’s the smallest, most basic element. However, that doesn’t mean it is stable; rather, it’s cutting edge (admittedly, Planck length is more at the cutting edge of physics than foundations of mathematics are at a cutting edge of mathematics). The stable things are more like cubits and inches—central, human-scale “theorems” such that if a new alternative foundation of mathematics contradicted them, we would throw away the foundation rather than the theorem. Some candidate “cubit” or “inch” theorems (really, arguments) might be the infinitude of primes, or the irrationality of the square root of 2.
Friedman and Simpson have a program of “Reverse Mathematics”, studying in what way facts that we normally think of as theorems, like the Heine-Borel theorem, pin down things that we normally think of as logic axioms, such as induction schemas. I really think that Simpson’s paper “Partial Realizations of Hilbert’s Program” is insightful and helpful to this discussion: http://www.math.psu.edu/simpson/papers/hilbert.ps
Lakatos’s “Proofs and Refutations” is also helpful, explaining in what sense a flawed and/or informal argument is helpful—Note that Lakatos calls arguments proofs, even when they’re flawed and/or informal, implicitly suggesting that everyone calls their arguments proofs until the flaws are pointed out. I understand he retreated from that philosophical position later in life, but that pre-retreat philosophical position might still be tenable.
I agree that we should weigh possible foundations against desired results and respect multiple possibilities as you say. However, we need a formalization of this. It might be that 1st order vs 2nd order is not important. I would suggest, however, that the puzzle I presented in the post is important. The proof-theoretic structure of 1st vs 2nd order might not be a big deal. (A learning system which prefers compact first-order theories can learn the desired many-sorted logic.) The structure of reasonable probabilistic beliefs over these two domains, though, is another thing yet! (A learner which prefers compact first-order theories cannot mimic the desired behavior which I described.)
You won’t automatically get the desired behavior by constructing some sort of intuitive learner based on informal principles. So, we need to discuss formalism.