TLDR Thinking about the Busy Beaver numbers has lead me to believe that just because a theorem holds true for a massive number of evaluated examples, this is only weak evidence it is actually true. Can we go meta on this?
“Li(𝑥) overestimates the number of primes below x more often than not, especially as x grows large. However, there are known low-lying values (like around x=10^316) discovered by Littlewood) where 𝜋(𝑥) exceeds Li(x), contradicting the general trend.”
This got me thinking about how common this kind of thing is and why? Why does a formula hold all the way up to 10^316 but then fail?
The essence of Busy Beaver numbers is that there are sequences based off of a simple formula/data that go on for a very long time and then just stop unpredictably. You can imagine replacing a simple formula with a simple theorem that appears to be true. Instead of it actually being true it is instead a way of encoding its very large counter example in a short amount of data.
If you think of it this way, a theorem that appears to be true and is evaluated over trillions of numbers is also instead a candidate to encode an exception at some very large number. In other words trillions of correct examples is only weak evidence of its correctness.
How much should we weight evaluation? We can’t evaluate to infinity and its obvious that a theorem being true to 2 million is not 2* evidence it is true at 1 million. Should we choose log(n)? A clear scale is the BB numbers themselves. e.g if your theorem is true up to BB(5) then that is 5 data points, rather than 47 million. Unlimited evaluation can never get to BB(6) so that is the limit of evidence from evaluation. (i.e. 5-6 evidence points with it being unclear how to weigh theory https://www.lesswrong.com/posts/MwQRucYo6BZZwjKE7/einstein-s-arrogance)
Now can we go meta?
Is some maths so much more powerful than others that it has equivalently greater weight as formal proof has to evaluation? Certainly some maths is more general than others. How does this effect common problems such as the Riemann Hypothesis—proving or disproving it affects a lot of maths. Showing it is correct to trillion zeros however is little evidence.
“Most mathematicians tend to believe that the Riemann Hypothesis is true, based on the weight of numerical evidence and its deep integration into existing mathematical frameworks.”
Is “deep integration” actually that deep, or is it the symbolic equivalent of evaluating up to 1 million? Perhaps just as you can find countless evaluated examples supporting a false theorem you can find much “deep integration” in favor of a famous theorem that could also be incorrect.
Further thoughts and links
Most people think P != NP, but what if P = NP where N ~ BB(10)?
Conservation of energy is a more general rule that rules out perpetual motion machines 2nd law of thermodynamics - likewise, HOWEVER that law must have been broken somehow to get a low entropy initial state for the Big Bang.
AI examples 1 The Polya Conjecture Proposed by George Pólya in 1919, this conjecture related to the distribution of prime numbers. It posited that for any number 𝑥, the majority of the numbers less than 𝑥 have an odd number of prime factors. It was verified for numbers up to 1,500,000, but a counterexample was found when x was around 906 million. This shows a fascinating case where numerical verification up to a large number was still not sufficient.
2 Mertens Conjecture The Mertens conjecture suggested that the absolute value of the Mertens function M(x) is always less than sqrt(x) This was proven false by computational means with a counterexample found above 10e14 by Andrew Odlyzko and Herman te Riele in 1985.
Unlimited evaluation can never get to BB(6) so that is the limit of evidence from evaluation.
The value of BB(6) is not currently known, but it could in principle be discovered. There is no general algorithm for calculating BB numbers, but any particular BB(n) could be determined by enumerating all n-state Turing machines and proving whether each one halts.
According to Scott, “Pavel Kropitz discovered, a couple years ago, that BB(6) is at least 10^10^10^10^10^10^10^10^10^10^10^10^10^10^10 (i.e., 10 raised to itself 15 times).”
So we can never evaluate BB(6) as it is at least this large
Evaluation vs Symbolism
TLDR
Thinking about the Busy Beaver numbers has lead me to believe that just because a theorem holds true for a massive number of evaluated examples, this is only weak evidence it is actually true. Can we go meta on this?
Main
After reading a post by Scott Aaronson, and this coming to my attention https://en.wikipedia.org/wiki/Prime_number_theorem and Littlewood’s theorem
“Li(𝑥) overestimates the number of primes below x more often than not, especially as x grows large. However, there are known low-lying values (like around x=10^316) discovered by Littlewood) where 𝜋(𝑥) exceeds Li(x), contradicting the general trend.”
This got me thinking about how common this kind of thing is and why? Why does a formula hold all the way up to 10^316 but then fail?
The essence of Busy Beaver numbers is that there are sequences based off of a simple formula/data that go on for a very long time and then just stop unpredictably. You can imagine replacing a simple formula with a simple theorem that appears to be true. Instead of it actually being true it is instead a way of encoding its very large counter example in a short amount of data.
If you think of it this way, a theorem that appears to be true and is evaluated over trillions of numbers is also instead a candidate to encode an exception at some very large number. In other words trillions of correct examples is only weak evidence of its correctness.
How much should we weight evaluation? We can’t evaluate to infinity and its obvious that a theorem being true to 2 million is not 2* evidence it is true at 1 million. Should we choose log(n)? A clear scale is the BB numbers themselves. e.g if your theorem is true up to BB(5) then that is 5 data points, rather than 47 million. Unlimited evaluation can never get to BB(6) so that is the limit of evidence from evaluation. (i.e. 5-6 evidence points with it being unclear how to weigh theory https://www.lesswrong.com/posts/MwQRucYo6BZZwjKE7/einstein-s-arrogance)
Now can we go meta?
Is some maths so much more powerful than others that it has equivalently greater weight as formal proof has to evaluation? Certainly some maths is more general than others. How does this effect common problems such as the Riemann Hypothesis—proving or disproving it affects a lot of maths. Showing it is correct to trillion zeros however is little evidence.
“Most mathematicians tend to believe that the Riemann Hypothesis is true, based on the weight of numerical evidence and its deep integration into existing mathematical frameworks.”
Is “deep integration” actually that deep, or is it the symbolic equivalent of evaluating up to 1 million? Perhaps just as you can find countless evaluated examples supporting a false theorem you can find much “deep integration” in favor of a famous theorem that could also be incorrect.
Further thoughts and links
Most people think P != NP, but what if
P = NP where N ~ BB(10)?
Proof was wrong—https://www.quantamagazine.org/mathematicians-prove-hawking-wrong-about-extremal-black-holes-20240821/
Related thoughts
Conservation of energy is a more general rule that rules out perpetual motion machines
2nd law of thermodynamics - likewise, HOWEVER that law must have been broken somehow to get a low entropy initial state for the Big Bang.
AI examples
1 The Polya Conjecture
Proposed by George Pólya in 1919, this conjecture related to the distribution of prime numbers. It posited that for any number 𝑥, the majority of the numbers less than 𝑥 have an odd number of prime factors. It was verified for numbers up to 1,500,000, but a counterexample was found when x was around 906 million. This shows a fascinating case where numerical verification up to a large number was still not sufficient.
2 Mertens Conjecture
The Mertens conjecture suggested that the absolute value of the Mertens function
M(x) is always less than sqrt(x)
This was proven false by computational means with a counterexample found above 10e14 by Andrew Odlyzko and Herman te Riele in 1985.
The value of BB(6) is not currently known, but it could in principle be discovered. There is no general algorithm for calculating BB numbers, but any particular BB(n) could be determined by enumerating all n-state Turing machines and proving whether each one halts.
According to Scott, “Pavel Kropitz discovered, a couple years ago, that BB(6) is at least 10^10^10^10^10^10^10^10^10^10^10^10^10^10^10 (i.e., 10 raised to itself 15 times).”
So we can never evaluate BB(6) as it is at least this large