Decimal digit computations as a testing ground for reasoning about probabilities
Points in this article emerged from a conversation with Anna Salamon
I think that thinking about decimal expansions of real numbers provides a good testing ground for one’s intuition about probabilities. The context of computation is very different from most of the contexts that humans deal with; in particular it’s much cleaner. As such, this testing ground should not be used in isolation; the understanding that one reaps from it needs to be integrated with knowledge from other contexts. Despite its limitations I think that it has something to add.
Given a computable real number x, a priori the probability that any string of n decimal digits comprises the first n decimal digits of x is 10-n. For concreteness, we’ll take x to be pi. It has long been conjectured that pi is a normal number. This is consistent with the notion that the digits of pi are “random” in some sense, and in this respect pi contrasts with (say) rational numbers and Liouville’s constant.
According to the Northwestern University homepage, pi has been computed to five trillion of digits. So to the extent that one trusts the result of the computation; there exists an example of a statement which had an a priori probability of 10-n with n > 5•1012 of being true which we now know to be true with high confidence. How much should we trust the computation? Well, I don’t know whether it’s been verified independently and there are a variety of relevant issues about which I know almost nothing (coding issues; hardware issues; the degree of rigor with which the algorithm used has been proven to be correct, etc.). One would have more confidence if one knew that several independent teams had succeeded in verifying the result using different algorithms & hardware. One would have still more confidence if one were personally involved in such a team and became convinced of the solidity of the methods used. Regardless:
(a) As early as 1962 mathematicians had computed pi to 105 digits. Presumably since then their computation has been checked many times over by a diversity of people and methods. Trusting a single source is still problematic as there may have been a typo or whatever, but it seems uncontroversial to think that if one uses the nearest apparently reliable computational package (say, Wolfram Alpha) then chance that the output is correct is > 10%. Thus we see how an initial probability estimate of 10-100000 can rise to a probability over 10-1 in practice.
(b) If one was determined one could probably develop ~90% confidence the accuracy over a billion digits of pi. I say this because it’s been over 20 years since computational power and algorithms have permitted such a vast computation; presumably by studying, testing and tweaking all programs written since then, one could do many checks on the accuracy of each of the first billion digits. Assuming that this is possible, an initial probability estimate of 10-1000000000 can in practice grow as large as > 0.9.
This shows that probabilities which are apparently very small can rapidly shift to being quite large with the influx of new information. There’s more that I could say about this but I think that the chunk that I’ve written so far is enough to warrant posting and that the rest of my thoughts are sufficiently ill-formed so that I shouldn’t try to say more right now. I welcome thoughts and comments.
Isn’t there a simpler way of making this point, without going into the properties of pi and the history of its computation? For example, I could write a small program that will fill my screen with a random n digit numbers at time X. Before time X, the probability of me seeing any particular number is 10^-n, and after time X it’s 1 for one number and 0 for others.
But I’m having trouble seeing the practical significance of this. It would be interesting to see Anna’s side of this conversation, to get a better idea of what you guys were talking about...
It may be that my exposition could be improved. I’m not sure whether or not the use of a particular real number is central to the line of thinking that I’m interested in exploring.
If any two independent people compute the first thousand digits of pi then the results should be identical whereas if two people generate 1000 random digits then their results will not be identical. In this way “the first 1000 digits of pi” is well defined in a way that “1000 random digits” is not.
Note that it’s possible to think that a string of n digits is probably the first n digits of pi and then learn that there was an error in the coding/algorithm used which causes a dramatic drop in the probability that the string of digits is correct whereas it’s not possible to learn that a string of random digits is “wrong.”One may learn that the algorithm used was not a good random number generator but that’s not quite the same thing...
Some readers may find the properties of pi and the history of its computation to be of independent interest.
It was just a casual conversation; a tangent to an unrelated subject. But my interest in the topic is that I think that reasoning about small probabilities may be important to x-risk reduction and I want to get a feel for the types of errors & the times of valid heuristics attached to the domain with a view toward thinking about x-risk reduction.
Once I found out that you can compute any digit of pi without computing any other digit (albeit in hexadecimal), my life just wasn’t the same anymore.
It surprised me tremendously, but I don’t think it’s affected me otherwise. If you were being literal, what changed for you?
It isn’t limited to hexidecimal.
I like the way the author formatted their summation formulas in ascii.
I think it showed me that Kolmogorov complexity of pi’s digits is really low (this was before I even understood what Kolmogorov complexity was). Looking at it now, however, it’s not all all surprising, given that pi’s definition is already very low complexity.
!!!
Unless I misunderstand something that could be used as an RNG with properties I’ve been looking for for ages! Sadly, I don’t understand half the math involved. Example code would be nice...
Is there something wrong with the Mersenne Twister?
Yes, lots of things, although I don’t know a quick or non-technical way of summing up what.
Looking at my response to Emile below will probably help.
Is there something wrong with AES in counter mode?
what is that?
Encrypting successive integers with AES gives you a pretty fast and very high quality PRNG.
apperently I’m geting to tired to cognite any compsci rigth now, it might work thou.
I’m curious, what properties are you looking for?
(I’m pretty interested in roguelikes and procedural content generation in games, and have played around with various algorithms for infinite consistent worlds etc. - if your name’s anything to go by, I expect you might be interested in that stuff too)
Something that can act like a lookup table of random binary data of infinite size and dimensionality, which is as fast as possible but have very low demands on the quality of the randomness.
This should if I got things right be both faster and more random than XORing a load of bitstrings repeating with different prime numbers length.
I don’t think it’s fast. The any-base one that NancyLebovitz linked to claims to be O((n log n)^3). The original hex one, I don’t know exactly but from the description on wikipedia it’s definitely worse than O(n) (I’d guess significantly worse).
oh, yea then it’s probably useless.
That sounds like a hash function, or are you looking for something else? (something faster?)
What applications do you have in mind? Procedural generation in games, or something else?
reads up on hash functions
Oooh, interesting. While not exactly right some could probably be re-purposed as an RNG yes… Especially cryptographic hash functions seem interesting, but they are probably far to slow. Something that looks like a cryptographic hash function to someone who really sucks at cryptography is probably what I’m looking for. Also there are probably some things to gain from collisions being a non-issue and the output needed being extremely short.
I don’t actually know exactly what I need this for, it’s just somehting I for some reason tend to stumble upon repeatedly when I think about compsci things and has thus concluded would be generally useful. Procedural generation, either for games, art, or math experiments, probably constitutes most of it thou.
In Python, you can do things like
… internally, he’s just using the hash of your input as a random seed, but then you can get random floats, etc.
I used something like that to get boundless fields of random temperatures, elevation and humidity, resulting in a tile-based world with mountains, oceans, swamps, forests etc. that you could scroll as far as you wanted to in any direction, and always get the same thing for a given position.
Nice trick, thanks, but isn’t it awfully slow?
How fast do you need it to be? My laptop can do SHA1 about 2^20 times with no noticeable delay.
Ok, that’s pretty fast for python. Should be fast enough for anything with speed requirements that makes python a viable language in the first place. Unless I did the math wrong somewhere.
Thanks.
Sorry, that’s not actually Python. I was minting a 20-bit Hashcash stamp.
Edit: In Python, I get 2^15 hashes in similar time.
Edit2: and if for some reason you need insane speed, I believe graphics cards can be made to perform hashing.
Edit3: with the specific code listed in the 4-parent, rather than using hashlib.sha1(), about 2^12. The point is that computers are fast.
4000 per second is probably way to slow for any application I might use, and I don’t think I have quite the kind of wizardry needed to employ the graphics card for anything.
I don’t think Pavitra was talking about seconds, but in “no noticeable delay”—on my laptop, in python, I get about 2^22 hashes per second.
Oh. Well then I don’t know but it’ll be worth trying if I ever need to do somehting like this in python.
Not awfully—I haven’t counted but for a screenful of terrain tiles there can’t be more than a thousand calls, which could be a bit slow if you’re recalculating everything every frame, but is fast enough as soon as you cache your results, so that you only need to calculate new, undiscovered terrain.
If performance is really a problem, it’s always possible to rewrite the slow bits in C, using simpler algorithms than the Python hash function.
I don’t think this statement is correct since it contradicts the law of conservation of expected evidence, i.e., if you anticipate that knowing some piece of information would increase your probability estimate of something, then simply knowing that that piece of information exists should correspondingly increase your estimate unless you also suspect that knowing the information might actually decrease your estimate by a corresponding amount.
If one had the inside info given by being part of the team, and this info decreased one’s confidence, one wouldn’t be part of the team.
This is true. I meant assuming that the computations are sound, being personally involved would increase one’s confidence relative to what it would otherwise be. I’ll change the post accordingly.