prime numbers are actually equidistributed in a technical sense
The last digit of the trillionth prime number is 3, which is about as non-equidistributed as you can get. That’s the right answer. Anything else is just a way for the robot to confess its ignorance gracefully.
EDIT: although one trouble might be that this robot isn’t equipped to handle improper distributions. So if you hand it an infinitude of finite simple groups and tell it to choose one, it assigns everything a probability of zero and chooses according to whatever algorithm it uses to choose between things that have identical utility.
Yes but since “trillionth prime” isn’t computable, the question translates to “some high prime” and among the set of odd primes, the ones digits are equidistributed over the set {1,3,7,9}. I agree with the point that this type of question can be attacked as though it were a probabilistic question.
My point is that heuristic reasoning WORKS on primes, this is a theorem. If you say “I don’t know the answer, but I will use heuristic reasoning to make a guess” and you ask for the ones digits of the trillionth, trillion and third, and two-trillion-and-seventeenth primes, you expect to get three numbers picked at random from that set, and guessing this way will serve you well.
There are other questions where you might think to apply heuristic reasoning, such as the ones digit of the number of isomorphism classes of finite groups of a given integer order, which like the question about primes is a function from the natural numbers to numbers 0-9, but I do not believe that it gives you a function which anything reasonable is known about; it doesn’t converge to a distribution on the set of digits, it just acts weird and changes drastically at unpredictable points.
One can make a slightly better guess, though. The number of digits in the trillionth prime can be found via some approximate prime counting rule, then the distribution of sums of all digits but last can be estimated, then the divisibility-by-3 rule over sum of all digits makes the last digits very slightly not equiprobable with regards to divisibility by 3. If you use algebra cleverly and avoid any unnecessary computations (such as computation of actual ‘probabilities’ whenever those are unnecessary, trying to answer question of A>B algebraically rather than arithmetically), you might be able to do that in time.
edit: or much more simply, if you have an upper bound and a lower bound on the value of the prime, then within this range the last digits are not equidistributed. Albeit it feels to me that the choice of the last digit will depend entirely to which method comes to mind of our agent, and in turn, the agent will appear irrational (priming—the choice of the digit will depend to what ideas the agent has available), and a malicious third party that knows the answer and wants our agent deprived of cookies, could talk the agent into making a wrong guess.
edit: this is quite interesting topic, actually. If you know that different approximation methods will yield different last digits, and you assume that the choice of an approximation method is random, then you should further scale down the utility difference to approximate the sum (average) over all methods.
Though, I don’t see anything between a very very slightly better guess, and the correct answer. And if the very very slightly better guess doesn’t return 3, you are guaranteed to lose.
If you try to count primes IN ANY WAY, then you will end up with the answer that YOU HAVE AN EQUAL CHANCE OF GETTING ANY COPRIME RESIDUE MODULO 10; this is a THEOREM. It is a proven fact of nature, or platonic ideals, or whatever it is theorems are facts of. Measure mod 3 will just tell you that the residues are also equidistributed modulo 3; it will reduce to computing residues modulo 30, which will not give you any information whatsoever because there will be an equal number that reduce to 1,3,7, and 9 modulo 10.
If you have an upper and lower bound obviously primes aren’t EXACTLY equidistributed, but in this case your upper and lower bounds will be away by millions, with primes showing up on average every 27 digits. The probabilities will work out, if you think of this as a random question selected from many possible and indistinguishable questions.
In real life if I ask you “how tall is the tallest Sequoia tree?” versus “how tall is the tallest scott’s pine?” Even though both questions have specific answers, if you don’t have wikipedia handy, you’ll need to default to “how tall do evergreens get” and the answer becomes a probability distribution. The same is true of prime numbers, once you blur out the details as much as they already have been blurred out. So the “correct answer” in a Bayesian sense is the well-calibrated answer, where you have a distribution and confidence intervals that are of known accuracy over many similar questions, even if you get the wrong answer sometimes. This sort of reasoning is totally transparent with respect to one specific mathematical phenomenon; the distribution of prime numbers in a range.
It may not be transparent with respect to other mathematical problems.
Equidistribution, of course, doesn’t imply “equal chances” for Nth prime. It’s 3. If you don’t have time to calculate 3, there’s no theorem saying you can’t conclude something (trivially, in edge cases where you have almost enough time you might be able to exclude 1 somehow). Other interesting thing is that 2^n-1 are rarely primes (n has to be prime and then still usually not), and we know that 2^n-1 don’t end in 9 ; impact of this decreases as n grows though. All sorts of small subtle things going on. (And yes, the resulting difference in “probabilities” is very small).
If you were able to conclude additional information about the one trillionth prime OTHER THAN that it is a large prime number which is approximately of size 27 trillion, then that information MAY contain information about that prime modulo 10, I agree.
I would be very surprised (p = .01) if there actually are results of the form “the nth prime has such and such characteristics modulo whatever” because of the strong equidistribution results that do exist, and obviously after someone computes the answer it is known. If you look at prime numbers and apply techniques from probability theory, it does work, and it works beautifully well. Adding information beyond “high prime” would allow you to apply techniques from probability theory to deal with logical uncertainty, and it would work well. The examples you give seem to be examples of this.
My core point is that other problems may be less susceptible to approaches from probability theory. Even if looking at digits of prime numbers is not a probability theory problem, using techniques from probability theory accesses information from theorems you may not have proven. Using techniques from probability theory elsewhere does not do this, because those theorems you haven’t proven aren’t even true.
So I am concerned, when someone purports to solve the problem of logical uncertainty, that their example problem is one whose solution looks like normal uncertainty.
I don’t think we disagree on any matters of fact; I think we may disagree about the definition of the word “probability.”
Yes, I agree. In other problems, your “probabilities” are going to not be statistically independent from basic facts of mathematics. I myself posted a top level comment with regards to ultimate futility of ‘probabilistic’ approach. Probability is not just like logic. If you have graph with loops or cycles, it is incredibly expensive. It isn’t some reals flowing through network of tubes, sides of a loop are not statistically independent. It doesn’t cut down your time at all, except in extreme examples (I once implemented a cryptographic algorithm dependent on Miller–Rabin primality test, which is “probabilistic”, and my understanding is that this is common in cryptography and is used by your browser any time you establish a SSL connection)
I think you may be mixing probability with frequency here.
If you say “I don’t know the answer, but I will use heuristic reasoning to make a guess” and you ask for the ones digits of the trillionth, trillion and third, and two-trillion-and-seventeenth primes, you expect to get three numbers picked at random from that set, and guessing this way will serve you well.
You should rather say that they’re independent if you’re computationally limited. But that independence doesn’t even matter for this post. I think you’re assigning too many things to the word “random.” Probability is not about random, it’s about ignorance.
EDIT: speaking of ignorance, I probably made this comment in ignorance of what you actually intended, see more recent comment.
Whoops. Well, I may have figured out what you mean. But just in case, would you be willing to try and explain what sort of extra reasoning the robot should be doing in certain cases as if I had no idea what you meant?
What I’m saying is that the robot will actually do well in the given example, because when you use Bayesian inference on primes it works for some reason.
Other questions in math have weird answers; for example, for every natural number n, the number of smooth structures on R^n is exactly 1… except for n=4, in which case there are uncountably many.
Probability distributions can continuously approximate this, but only with a lot of difficulty and inaccuracy. It’s an answer that isn’t going to nicely appear out of doing Bayesian thinking. Once an agent has seen one dimension with an infinite number, they’ll be hard pressed to accept that no other dimension has more than 1… or seeing an infinite number with 1, how will they accept one with infinitely many? Basically your priors have to be weirdly formalized. This may be a problem that has been addressed.
So my point is that just “throw Bayes at it” is a totally reasonable way of making probabilistic estimates for theorems about countable and compact sets. It’s just that that method of reasoning will vary wildly in how useful it is, because some mathematical results look like probabilities and some don’t.
I don’t have idea for extra reasoning processes I just want to urge caution that the problem isn’t solved because it can handle some situations.
Hm. I think you think the robot works differently than it actually does.
My guess for what you were going to say was that you wanted the robot to work in this different way, but nopes.
A robot that predicts the next number in a series is something like a minimum message-length predictor. The robot outlined here actually can’t do that.
Instead, the robot attacks each problem based on theorems that require or rule out different combinations of relevant possibilities. So, for example, in the question “how many smooth structures are there in n dimensions” (thank you for making that example clear btw), the robot would for each separate case try to prove what was going on, and if it failed, it would likely do something dumb (like run out of time, or pick the first thing it hadn’t proven a theorem about) because it doesn’t handle infinity-option problems very well. What it wouldn’t do is try and predict the answer based on the answer for smaller dimensions, unless it could prove theorems connecting the two.
Okay, but say the robot in ten seconds manages to prove the theorem: “For all dimensions n not equal to 4, there is exactly one smooth structure on R^n.” But is unable to produce a result regarding n=4? Or more realistically, say it is able to construct 1 smooth structure on R^n for any n, and can prove that no others exist unless n=4. How does it make guesses in the case n=4?
If we want to live in the least convenient possible world assume that in second 1 it constructs the smooth structures; it takes three seconds to prove that there are no more for n>5, three seconds to prove no more for n=1,2, and three more seconds to prove there are no more for n=3 and runs out of time. These results are obtained incidentally from inequalities that arise when pursuing a proof for n=4, which is the central value of some equation at the core of the proofs. (so the proofs really say “if another smooth structure exists, it exists for n<5, 2<n<5, 3<n<5.”)
If it really can’t prove any theorems that directly include the translation of “the number of smooth structures for n=4 is,” it simply won’t ever update that.
one trouble might be that this robot isn’t equipped to handle improper distributions. So if you hand it an infinitude of finite simple groups and tell it to choose one, it assigns everything a probability of zero and chooses according to whatever algorithm it uses to choose between things that have identical utility.
I don’t think this is too hard to fix. The robot could have a unit of improper prior (epsilon) that it remembers is larger than zero but smaller than any positive real.
Of course, this doesn’t tell you what to do when asked to guess whether the order of the correct finite simple group is even, which might be a pretty big drawback.
For the robot as described, this will actually happen (sort of like Wei Dai’s comment—I’m learning a lot from discussing with you guys :D ) - it only actually lowers something’s probability once it proves something about it specifically, so it just lowers the probability of most of its infinite options by some big exponential, and then, er, runs out of time trying to pick the option with highest utility. Okay, so there might be a small flaw.
The last digit of the trillionth prime number is 3, which is about as non-equidistributed as you can get. That’s the right answer. Anything else is just a way for the robot to confess its ignorance gracefully.
EDIT: although one trouble might be that this robot isn’t equipped to handle improper distributions. So if you hand it an infinitude of finite simple groups and tell it to choose one, it assigns everything a probability of zero and chooses according to whatever algorithm it uses to choose between things that have identical utility.
Yes but since “trillionth prime” isn’t computable, the question translates to “some high prime” and among the set of odd primes, the ones digits are equidistributed over the set {1,3,7,9}. I agree with the point that this type of question can be attacked as though it were a probabilistic question.
My point is that heuristic reasoning WORKS on primes, this is a theorem. If you say “I don’t know the answer, but I will use heuristic reasoning to make a guess” and you ask for the ones digits of the trillionth, trillion and third, and two-trillion-and-seventeenth primes, you expect to get three numbers picked at random from that set, and guessing this way will serve you well.
There are other questions where you might think to apply heuristic reasoning, such as the ones digit of the number of isomorphism classes of finite groups of a given integer order, which like the question about primes is a function from the natural numbers to numbers 0-9, but I do not believe that it gives you a function which anything reasonable is known about; it doesn’t converge to a distribution on the set of digits, it just acts weird and changes drastically at unpredictable points.
One can make a slightly better guess, though. The number of digits in the trillionth prime can be found via some approximate prime counting rule, then the distribution of sums of all digits but last can be estimated, then the divisibility-by-3 rule over sum of all digits makes the last digits very slightly not equiprobable with regards to divisibility by 3. If you use algebra cleverly and avoid any unnecessary computations (such as computation of actual ‘probabilities’ whenever those are unnecessary, trying to answer question of A>B algebraically rather than arithmetically), you might be able to do that in time.
edit: or much more simply, if you have an upper bound and a lower bound on the value of the prime, then within this range the last digits are not equidistributed. Albeit it feels to me that the choice of the last digit will depend entirely to which method comes to mind of our agent, and in turn, the agent will appear irrational (priming—the choice of the digit will depend to what ideas the agent has available), and a malicious third party that knows the answer and wants our agent deprived of cookies, could talk the agent into making a wrong guess.
edit: this is quite interesting topic, actually. If you know that different approximation methods will yield different last digits, and you assume that the choice of an approximation method is random, then you should further scale down the utility difference to approximate the sum (average) over all methods.
Though, I don’t see anything between a very very slightly better guess, and the correct answer. And if the very very slightly better guess doesn’t return 3, you are guaranteed to lose.
If you try to count primes IN ANY WAY, then you will end up with the answer that YOU HAVE AN EQUAL CHANCE OF GETTING ANY COPRIME RESIDUE MODULO 10; this is a THEOREM. It is a proven fact of nature, or platonic ideals, or whatever it is theorems are facts of. Measure mod 3 will just tell you that the residues are also equidistributed modulo 3; it will reduce to computing residues modulo 30, which will not give you any information whatsoever because there will be an equal number that reduce to 1,3,7, and 9 modulo 10.
If you have an upper and lower bound obviously primes aren’t EXACTLY equidistributed, but in this case your upper and lower bounds will be away by millions, with primes showing up on average every 27 digits. The probabilities will work out, if you think of this as a random question selected from many possible and indistinguishable questions.
In real life if I ask you “how tall is the tallest Sequoia tree?” versus “how tall is the tallest scott’s pine?” Even though both questions have specific answers, if you don’t have wikipedia handy, you’ll need to default to “how tall do evergreens get” and the answer becomes a probability distribution. The same is true of prime numbers, once you blur out the details as much as they already have been blurred out. So the “correct answer” in a Bayesian sense is the well-calibrated answer, where you have a distribution and confidence intervals that are of known accuracy over many similar questions, even if you get the wrong answer sometimes. This sort of reasoning is totally transparent with respect to one specific mathematical phenomenon; the distribution of prime numbers in a range.
It may not be transparent with respect to other mathematical problems.
Equidistribution, of course, doesn’t imply “equal chances” for Nth prime. It’s 3. If you don’t have time to calculate 3, there’s no theorem saying you can’t conclude something (trivially, in edge cases where you have almost enough time you might be able to exclude 1 somehow). Other interesting thing is that 2^n-1 are rarely primes (n has to be prime and then still usually not), and we know that 2^n-1 don’t end in 9 ; impact of this decreases as n grows though. All sorts of small subtle things going on. (And yes, the resulting difference in “probabilities” is very small).
If you were able to conclude additional information about the one trillionth prime OTHER THAN that it is a large prime number which is approximately of size 27 trillion, then that information MAY contain information about that prime modulo 10, I agree.
I would be very surprised (p = .01) if there actually are results of the form “the nth prime has such and such characteristics modulo whatever” because of the strong equidistribution results that do exist, and obviously after someone computes the answer it is known. If you look at prime numbers and apply techniques from probability theory, it does work, and it works beautifully well. Adding information beyond “high prime” would allow you to apply techniques from probability theory to deal with logical uncertainty, and it would work well. The examples you give seem to be examples of this.
My core point is that other problems may be less susceptible to approaches from probability theory. Even if looking at digits of prime numbers is not a probability theory problem, using techniques from probability theory accesses information from theorems you may not have proven. Using techniques from probability theory elsewhere does not do this, because those theorems you haven’t proven aren’t even true. So I am concerned, when someone purports to solve the problem of logical uncertainty, that their example problem is one whose solution looks like normal uncertainty.
I don’t think we disagree on any matters of fact; I think we may disagree about the definition of the word “probability.”
Yes, I agree. In other problems, your “probabilities” are going to not be statistically independent from basic facts of mathematics. I myself posted a top level comment with regards to ultimate futility of ‘probabilistic’ approach. Probability is not just like logic. If you have graph with loops or cycles, it is incredibly expensive. It isn’t some reals flowing through network of tubes, sides of a loop are not statistically independent. It doesn’t cut down your time at all, except in extreme examples (I once implemented a cryptographic algorithm dependent on Miller–Rabin primality test, which is “probabilistic”, and my understanding is that this is common in cryptography and is used by your browser any time you establish a SSL connection)
Okay yes we are in agreement.
I think you may be mixing probability with frequency here.
You should rather say that they’re independent if you’re computationally limited. But that independence doesn’t even matter for this post. I think you’re assigning too many things to the word “random.” Probability is not about random, it’s about ignorance.
EDIT: speaking of ignorance, I probably made this comment in ignorance of what you actually intended, see more recent comment.
Whoops. Well, I may have figured out what you mean. But just in case, would you be willing to try and explain what sort of extra reasoning the robot should be doing in certain cases as if I had no idea what you meant?
What I’m saying is that the robot will actually do well in the given example, because when you use Bayesian inference on primes it works for some reason.
Other questions in math have weird answers; for example, for every natural number n, the number of smooth structures on R^n is exactly 1… except for n=4, in which case there are uncountably many.
Probability distributions can continuously approximate this, but only with a lot of difficulty and inaccuracy. It’s an answer that isn’t going to nicely appear out of doing Bayesian thinking. Once an agent has seen one dimension with an infinite number, they’ll be hard pressed to accept that no other dimension has more than 1… or seeing an infinite number with 1, how will they accept one with infinitely many? Basically your priors have to be weirdly formalized. This may be a problem that has been addressed.
So my point is that just “throw Bayes at it” is a totally reasonable way of making probabilistic estimates for theorems about countable and compact sets. It’s just that that method of reasoning will vary wildly in how useful it is, because some mathematical results look like probabilities and some don’t.
I don’t have idea for extra reasoning processes I just want to urge caution that the problem isn’t solved because it can handle some situations.
Hm. I think you think the robot works differently than it actually does.
My guess for what you were going to say was that you wanted the robot to work in this different way, but nopes.
A robot that predicts the next number in a series is something like a minimum message-length predictor. The robot outlined here actually can’t do that.
Instead, the robot attacks each problem based on theorems that require or rule out different combinations of relevant possibilities. So, for example, in the question “how many smooth structures are there in n dimensions” (thank you for making that example clear btw), the robot would for each separate case try to prove what was going on, and if it failed, it would likely do something dumb (like run out of time, or pick the first thing it hadn’t proven a theorem about) because it doesn’t handle infinity-option problems very well. What it wouldn’t do is try and predict the answer based on the answer for smaller dimensions, unless it could prove theorems connecting the two.
Okay, but say the robot in ten seconds manages to prove the theorem: “For all dimensions n not equal to 4, there is exactly one smooth structure on R^n.” But is unable to produce a result regarding n=4? Or more realistically, say it is able to construct 1 smooth structure on R^n for any n, and can prove that no others exist unless n=4. How does it make guesses in the case n=4?
If we want to live in the least convenient possible world assume that in second 1 it constructs the smooth structures; it takes three seconds to prove that there are no more for n>5, three seconds to prove no more for n=1,2, and three more seconds to prove there are no more for n=3 and runs out of time. These results are obtained incidentally from inequalities that arise when pursuing a proof for n=4, which is the central value of some equation at the core of the proofs. (so the proofs really say “if another smooth structure exists, it exists for n<5, 2<n<5, 3<n<5.”)
If it really can’t prove any theorems that directly include the translation of “the number of smooth structures for n=4 is,” it simply won’t ever update that.
Well it can prove “the number of structures for n=4 is at least 1.”
I don’t think this is too hard to fix. The robot could have a unit of improper prior (epsilon) that it remembers is larger than zero but smaller than any positive real.
Of course, this doesn’t tell you what to do when asked to guess whether the order of the correct finite simple group is even, which might be a pretty big drawback.
For the robot as described, this will actually happen (sort of like Wei Dai’s comment—I’m learning a lot from discussing with you guys :D ) - it only actually lowers something’s probability once it proves something about it specifically, so it just lowers the probability of most of its infinite options by some big exponential, and then, er, runs out of time trying to pick the option with highest utility. Okay, so there might be a small flaw.