If you start selecting things at random, then you need a probability distribution. Many routinely used probability distributions over the natural numbers give you a nontrivial chance of being able to fit the number on your computer.
There are, of course, corresponding probability distributions over the reals (take a probability distribution over the natural numbers and give zero probability to anything else). However, the routinely used probability distributions on the reals give zero probability to the output being a natural number, a rational number, describable with a finite algebraic equation, or in fact, being able to fit the number on your computer.
One of the problems with real numbers is that if someone trying to do Bayesian analysis of a sensor that reads 2.000..., or 3.14159… using one of these real number distributions as their prior, cannot conclude that the quantity measured probably is 2 or pi, no matter how many digits of precision the sensor goes out to.
I get that the sensor thing was only an example, but still: it doesn’t seem like a real objection. I mean, you’re not going to have (or need) a sensor with infinity decimals of precision. (Or perhaps I’m not understanding you?)
In terms of “selecting things at random”, for any practical use I can think of you’ll be selecting things like intervals, not real numbers. I don’t quite see how that’s relevant to the formalism you use to reason about how and what you’re calculating.
I think there’s some big understanding gap here. Could you explain (or just point to some standard text), how does one reason about trivial things like areas of circles and volumes of spheres without using reals?
Perhaps you’ve confused the “pi has a decimal expansion goes on forever without seeming pattern” with “a generic real number has a decimal expansion that goes on forever without pattern”? Pi does have a finite representation, “Pi”. We use it all the time, and it’s just as precise as “2″ or “0.5”.
Specifically, you could start with the rationals, and complete it by including all solutions to differential equations. Pi would be included, and many other numbers, but you’d still only have a countable set—because every number would have one or more shortest definitions—finite information content.
If you had a probability distribution over such a set, it would naturally favor hypotheses with short definitions. If it started out including pi as a possibility, and you gathered sufficient sensor data consistent with pi (a finite amount), the probability distribution would give pi as the best hypothesis. This is reasonable behavior. You have to do non-obvious mucking around with your prior to get that sort of reasonableness with standard real-number probabilities.
As others have pointed out, any specific countable system of numbers (such as the “solutions to differential equations” that I mentioned) is susceptible to diagonalization—but I see no reason to “complete” the process, as if being susceptible to diagonalization were a terrible flaw above all others. All the entities that you’re actually manipulating (finite precision data and finite, stateable hypotheses like “exactly 2” and “exactly pi”) are finite-information-content, and “completing” the reals against diagonalization makes essentially all the reals infinite-information-content—a cure in my mind far worse than the disease.
(Note: I’m not arguing in this particular post, just asking clarifying questions, as you seem to have the issues much clearer in your mind than I do.)
1) It seems one can start with naturals, extend them to integers, then to rationals, then to whatever set results from including solutions to differential equations (does that have a standard name?). I imagine there are countably infinite many constructions like that, am I right? They seem to “divide” the numbers “finer” (I’d welcome a hint to more formal description of this), though they aren’t necessarily totally ordered in terms of how “fine” they are, and that the limit of this process after an infinity of extensions seems to be the reals. (Am I missing something important until here? In particular, we can reach the reals much faster, is there some important property in particular the countable extensions have in general, other than their result set being countable and their individual structure?)
2) Do you have other objections to real numbers that do not involve probabilities, probability distributions, and similar information theory concepts?
3) I don’t quite grok your π example. It seems to me that a finite amount of sensor data will always only be able to tell you it’s consistent with all values in the interval π±ε; if you’re using a sufficiently “dense” set, even just the rationals, you’ll have an infinity of values in that interval, while using the reals you’ll have an uncountable one. In the countable case you’ll have to have probabilities for the countable infinity of consistent values, which could result in π being the most probable one, and in the uncountable one you’ll need a probability distribution function, which could as well have π as the most probable. (In particular, I can’t see a reason why you couldn’t find a the probability distribution function that has exactly the same value as your probability function when applied to the values in your π-containing countably-infinite set and is “well-behaved” in some sense on the reals between them; but I’m likely to miss something here.)
I sort-of get that picking π in a countable set can be a finite-information operation and an infinite-information one in an uncountable set (though I’m not quite clear if or why that works on sets at least as “finely divided” as the rationals). But that seems to be a trick of picking the right countable set to contain the value you’re looking for:
If you started estimating π (let’s say, the ratio of diameter to circumference in an euclidean universe) with, say, just the rationals, you may or may not get a “most likely” hypothesis, but it wouldn’t be π; you’ll only estimate that one if you happened to start with a set that contained it. And if you use a set that contains π, there would always be some kind of other number that fits in a “finer”-but-countable set you aren’t using that you might need to estimate (assuming there’s a lot of such sets as I speculate in point 1 above).
Of course, using the reals doesn’t save you from that: you still need an infinite amount of information to find an arbitrary real. But by using probability distributions—even if you construct them by picking a probability function on a countable set and then extending it to the reals somehow—it forces you to think about the parts outside that countable set (i.e., other even “finer” countable sets). In a way, this feels like reminding you of things you didn’t think of.
1) Yes, there are countably many constructions of various kinds of numbers. The construction can presumably be written down, and strings are finite-information-content entities. Yes, they’re normally understood to form a set-theoretic lattice—the integers are a subset of the gaussian integers, and the integers are a subset of the rationals, and both the gaussian and rationals are a subset of the complex plane.
However, the reals are not in any well-defined sense “the” limit of that lattice—you could create a contrived argument that they are, but you could also create an argument that the natural limit is something else, either stopping sooner, or continuing further to include infinities and infinitiesimals or (salient to mathematicians) the complex plane.
Defenders of the reals as a natural concept will use the phrase “the complete ordered field”, but when you examine the definition of “complete” they are referencing, it uses a significant amount of set theory (and an ugly Dedekind cuts construction) to include everything that it wants to include, and exclude many other things that might seem to be included.
2) Yes. I think the reals are a history-laden concept; they were built in fear of set-theoretic and calculus paradoxes, and without awareness of the computational approach—information theory and Godel’s incompleteness. They are useful primarily in the way that C++ is useful to C++ programmers—as a thicket or swamp of complexity that defends the residents from attack. Any mathematician doing useful work in a statistical, calculus-related, or topological field who casually uses the reals will need someone else, a numerical analyst or computer scientist, to laboriously go through their work and take out the reals, replacing them with a computable (and countable) alternative notion—different notions for different results. Often, this effort is neglected, and people use IEEE floats where the mathematician said “real”, and get ridiculous results—or worse, dangerously reasonable results.
3) You’re right that the finite amount of sensor data will only say it is consistent with this interval. As you point out, if there are an uncountable set within that interval, then it’s entirely possible for there to be no single value that is a maximum of the probability distribution function. (That’s an excellent example of some of the ridiculous circumlocutions that come from using uncountable sets, when you actually want the system to come up with one or a few best hypotheses, each of which is stateable.)
Pi is always a finite-information entity. Everything nameable is. It doesn’t become an infinitely large in information content just because you consider it as an element of the reals.
Yes—if you use a probability distribution over the rationals as your prior, and the actual value is irrational, then you can get bad results. I think this is a serious problem, and we should think hard about what bayesian updating with misspecified models looks like (I know Cosma Shalizi has done some work on this), so that we have some idea what failure looks like. We should also think carefully about what we would consider to be a reasonable hypothesis, one that we might eventually come to rest on.
However, it’s a false fork to argue “We wouldn’t use the rationals therefore we should use the reals”. As I’ve been trying to say, the reals are a particular, large, complicated, and deeply historical construction, and we should not expect to encounter them “out in the world”.
Andrej Bauer has implemented actual real number arithmetic (not IEEE nonsense, or “computable reals” which are interesting, but not reals). Peano integers, in his (Ocaml-based) language, RZ, would probably be five or ten lines. (Commutative groups are 13 lines). In contrast, building from commutative groups up to defining reals as sequences of nested intervals takes five pages; see the appendix: http://math.andrej.com/wp-content/uploads/2007/04/rzreals.pdf
Regarding “reminding you of things you didn’t think of”, I think Cosma Shalizi and Andrew Gelman have convincingly argued that Bayesian philosophy/methodology is flawed—we don’t just pick a prior, collect data, do an update, and believe the results. If we were magical uncomputable beasties (Solomonoff induction), possibly that is what we would do. In the real world, there are other steps, including examining the data, including the residual errors, to see if it suggests hypotheses that weren’t included in the original prior. http://www.stat.columbia.edu/~gelman/research/unpublished/philosophy.pdf
Hi John! Thank you very much for taking the time to answer at such length. The links you included were also very interesting, thanks.
I think I got a bit of insight into the original issue (way up in the comments, when I interjected in your chat with Patrick).
With respect to the points closer in this thread, it’s become more like teaching than an actual discussion. I’m much too little educated in the subject, so I could contribute mostly with questions (many inevitably naïve) rather than insights. I’ll stop here then; though I am interested, I’m not interested enough right now to educate myself, so I won’t impose on your time any longer.
(That is, not unless you want to. I can continue if for some reason you’d take pleasure in educating me further.)
If you start selecting things at random, then you need a probability distribution. Many routinely used probability distributions over the natural numbers give you a nontrivial chance of being able to fit the number on your computer.
There are, of course, corresponding probability distributions over the reals (take a probability distribution over the natural numbers and give zero probability to anything else). However, the routinely used probability distributions on the reals give zero probability to the output being a natural number, a rational number, describable with a finite algebraic equation, or in fact, being able to fit the number on your computer.
One of the problems with real numbers is that if someone trying to do Bayesian analysis of a sensor that reads 2.000..., or 3.14159… using one of these real number distributions as their prior, cannot conclude that the quantity measured probably is 2 or pi, no matter how many digits of precision the sensor goes out to.
I get that the sensor thing was only an example, but still: it doesn’t seem like a real objection. I mean, you’re not going to have (or need) a sensor with infinity decimals of precision. (Or perhaps I’m not understanding you?)
In terms of “selecting things at random”, for any practical use I can think of you’ll be selecting things like intervals, not real numbers. I don’t quite see how that’s relevant to the formalism you use to reason about how and what you’re calculating.
I think there’s some big understanding gap here. Could you explain (or just point to some standard text), how does one reason about trivial things like areas of circles and volumes of spheres without using reals?
Perhaps you’ve confused the “pi has a decimal expansion goes on forever without seeming pattern” with “a generic real number has a decimal expansion that goes on forever without pattern”? Pi does have a finite representation, “Pi”. We use it all the time, and it’s just as precise as “2″ or “0.5”.
Specifically, you could start with the rationals, and complete it by including all solutions to differential equations. Pi would be included, and many other numbers, but you’d still only have a countable set—because every number would have one or more shortest definitions—finite information content.
If you had a probability distribution over such a set, it would naturally favor hypotheses with short definitions. If it started out including pi as a possibility, and you gathered sufficient sensor data consistent with pi (a finite amount), the probability distribution would give pi as the best hypothesis. This is reasonable behavior. You have to do non-obvious mucking around with your prior to get that sort of reasonableness with standard real-number probabilities.
As others have pointed out, any specific countable system of numbers (such as the “solutions to differential equations” that I mentioned) is susceptible to diagonalization—but I see no reason to “complete” the process, as if being susceptible to diagonalization were a terrible flaw above all others. All the entities that you’re actually manipulating (finite precision data and finite, stateable hypotheses like “exactly 2” and “exactly pi”) are finite-information-content, and “completing” the reals against diagonalization makes essentially all the reals infinite-information-content—a cure in my mind far worse than the disease.
(Note: I’m not arguing in this particular post, just asking clarifying questions, as you seem to have the issues much clearer in your mind than I do.)
1) It seems one can start with naturals, extend them to integers, then to rationals, then to whatever set results from including solutions to differential equations (does that have a standard name?). I imagine there are countably infinite many constructions like that, am I right? They seem to “divide” the numbers “finer” (I’d welcome a hint to more formal description of this), though they aren’t necessarily totally ordered in terms of how “fine” they are, and that the limit of this process after an infinity of extensions seems to be the reals. (Am I missing something important until here? In particular, we can reach the reals much faster, is there some important property in particular the countable extensions have in general, other than their result set being countable and their individual structure?)
2) Do you have other objections to real numbers that do not involve probabilities, probability distributions, and similar information theory concepts?
3) I don’t quite grok your π example. It seems to me that a finite amount of sensor data will always only be able to tell you it’s consistent with all values in the interval π±ε; if you’re using a sufficiently “dense” set, even just the rationals, you’ll have an infinity of values in that interval, while using the reals you’ll have an uncountable one. In the countable case you’ll have to have probabilities for the countable infinity of consistent values, which could result in π being the most probable one, and in the uncountable one you’ll need a probability distribution function, which could as well have π as the most probable. (In particular, I can’t see a reason why you couldn’t find a the probability distribution function that has exactly the same value as your probability function when applied to the values in your π-containing countably-infinite set and is “well-behaved” in some sense on the reals between them; but I’m likely to miss something here.)
I sort-of get that picking π in a countable set can be a finite-information operation and an infinite-information one in an uncountable set (though I’m not quite clear if or why that works on sets at least as “finely divided” as the rationals). But that seems to be a trick of picking the right countable set to contain the value you’re looking for:
If you started estimating π (let’s say, the ratio of diameter to circumference in an euclidean universe) with, say, just the rationals, you may or may not get a “most likely” hypothesis, but it wouldn’t be π; you’ll only estimate that one if you happened to start with a set that contained it. And if you use a set that contains π, there would always be some kind of other number that fits in a “finer”-but-countable set you aren’t using that you might need to estimate (assuming there’s a lot of such sets as I speculate in point 1 above).
Of course, using the reals doesn’t save you from that: you still need an infinite amount of information to find an arbitrary real. But by using probability distributions—even if you construct them by picking a probability function on a countable set and then extending it to the reals somehow—it forces you to think about the parts outside that countable set (i.e., other even “finer” countable sets). In a way, this feels like reminding you of things you didn’t think of.
OK, what am I missing?
1) Yes, there are countably many constructions of various kinds of numbers. The construction can presumably be written down, and strings are finite-information-content entities. Yes, they’re normally understood to form a set-theoretic lattice—the integers are a subset of the gaussian integers, and the integers are a subset of the rationals, and both the gaussian and rationals are a subset of the complex plane.
However, the reals are not in any well-defined sense “the” limit of that lattice—you could create a contrived argument that they are, but you could also create an argument that the natural limit is something else, either stopping sooner, or continuing further to include infinities and infinitiesimals or (salient to mathematicians) the complex plane.
Defenders of the reals as a natural concept will use the phrase “the complete ordered field”, but when you examine the definition of “complete” they are referencing, it uses a significant amount of set theory (and an ugly Dedekind cuts construction) to include everything that it wants to include, and exclude many other things that might seem to be included.
2) Yes. I think the reals are a history-laden concept; they were built in fear of set-theoretic and calculus paradoxes, and without awareness of the computational approach—information theory and Godel’s incompleteness. They are useful primarily in the way that C++ is useful to C++ programmers—as a thicket or swamp of complexity that defends the residents from attack. Any mathematician doing useful work in a statistical, calculus-related, or topological field who casually uses the reals will need someone else, a numerical analyst or computer scientist, to laboriously go through their work and take out the reals, replacing them with a computable (and countable) alternative notion—different notions for different results. Often, this effort is neglected, and people use IEEE floats where the mathematician said “real”, and get ridiculous results—or worse, dangerously reasonable results.
3) You’re right that the finite amount of sensor data will only say it is consistent with this interval. As you point out, if there are an uncountable set within that interval, then it’s entirely possible for there to be no single value that is a maximum of the probability distribution function. (That’s an excellent example of some of the ridiculous circumlocutions that come from using uncountable sets, when you actually want the system to come up with one or a few best hypotheses, each of which is stateable.)
Pi is always a finite-information entity. Everything nameable is. It doesn’t become an infinitely large in information content just because you consider it as an element of the reals.
Yes—if you use a probability distribution over the rationals as your prior, and the actual value is irrational, then you can get bad results. I think this is a serious problem, and we should think hard about what bayesian updating with misspecified models looks like (I know Cosma Shalizi has done some work on this), so that we have some idea what failure looks like. We should also think carefully about what we would consider to be a reasonable hypothesis, one that we might eventually come to rest on.
However, it’s a false fork to argue “We wouldn’t use the rationals therefore we should use the reals”. As I’ve been trying to say, the reals are a particular, large, complicated, and deeply historical construction, and we should not expect to encounter them “out in the world”.
Andrej Bauer has implemented actual real number arithmetic (not IEEE nonsense, or “computable reals” which are interesting, but not reals). Peano integers, in his (Ocaml-based) language, RZ, would probably be five or ten lines. (Commutative groups are 13 lines). In contrast, building from commutative groups up to defining reals as sequences of nested intervals takes five pages; see the appendix: http://math.andrej.com/wp-content/uploads/2007/04/rzreals.pdf
Regarding “reminding you of things you didn’t think of”, I think Cosma Shalizi and Andrew Gelman have convincingly argued that Bayesian philosophy/methodology is flawed—we don’t just pick a prior, collect data, do an update, and believe the results. If we were magical uncomputable beasties (Solomonoff induction), possibly that is what we would do. In the real world, there are other steps, including examining the data, including the residual errors, to see if it suggests hypotheses that weren’t included in the original prior. http://www.stat.columbia.edu/~gelman/research/unpublished/philosophy.pdf
Hi John! Thank you very much for taking the time to answer at such length. The links you included were also very interesting, thanks.
I think I got a bit of insight into the original issue (way up in the comments, when I interjected in your chat with Patrick).
With respect to the points closer in this thread, it’s become more like teaching than an actual discussion. I’m much too little educated in the subject, so I could contribute mostly with questions (many inevitably naïve) rather than insights. I’ll stop here then; though I am interested, I’m not interested enough right now to educate myself, so I won’t impose on your time any longer.
(That is, not unless you want to. I can continue if for some reason you’d take pleasure in educating me further.)
Thank you again for sharing your thoughts :-)