An anti-inductive sequence
I was thinking about what would it mean for a sequence of bits to be “anti-inductive”. It probably is a concept that is already known (as a rule of thumb, if I can think about it, someone probably already wrote a paper on it 50 years ago), but I haven’t heard about it.
*
Some sequences are predictable and can be compressed. These two concepts are deeply related, because if you can successfully predict the next part of the sequence, you don’t need to actually write it down; hence compression. A completely random sequence of bits cannot be compressed or predicted.
There is a simple mathematical proof that some sequences cannot be compressed, although it doesn’t say which ones. For any natural number N, there are more sequences of size exactly N, than sequences of size smaller than N. Therefore no program can generate a unique sequence shorter than N for any input sequence of size N.
*
Things get more complicated if we consider the caveat that although random sequences in general cannot be compressed, true randomness means that sometimes we accidentally get a sequence that can be compressed—for example, with probability 1/2ᴺ we get a sequence of N zeroes, and it would sound silly to argue that we can’t compress that!
The solution to this paradox is that if we decide to compress only some selected sequences, then we need to add an extra bit of information specifying whether this sequence was compressed or not. Otherwise, if we see a sequence of bits saying (in binary) “a sequence of thousand zeroes”, we wouldn’t know whether the intended value is this very sequence of bits taken literally, or the sequence of thousand zeroes. One bit doesn’t seem like much, but actually most sequences cannot be compressed, so the cost of adding one extra bit to each of them outweighs the occasional space we would save by compressing the ones we can.
But still, if I needed a random sequence of bits to use e.g. as a password for something important… and by a miracle I generated a sequence of N zeroes… I would probably feel quite uncomfortable to use it, and would simply generate a new one. Is this a good security practice, or not? Because from certain perspective, by removing the “insufficiently random” sequences from the pool of possible choices, I am reducing the size of the pool, which… kinda makes it easier to guess the password?
Something similar actually happened to me once. I used a mobile application that asked me to create a 4-digit PIN. So I typed 4 random digits, and the application said: “nope, you cannot use the same digit multiple times in the PIN, that is not secure”. So I typed 4 random digits again, and the application said: “nope, you cannot use an ascending sequence of digits, that is not secure”. So I typed 4 random digits yet again, and this time the application was finally satisfied. But it felt funny that the more “security checks” the application uses, the more limited is the choice of possible PINs. There are only 10000 possible combinations of 4 digits to start with; I wonder how far an overzealous security department could reduce that number. In a hypothetical extreme case, we would be left with only one possible choice of PIN—certainly the most secure one that no one could possibly guess! The holy grail of information security.
*
Okay, back to the sequences of bits. Imagine that we are trying to create not just any random sequence, but the single most random, most unpredictable sequence ever! Suppose the length of the sequence is not specified in advance, so we just keep generating bits one by one, for as long as necessary.
What we could do is create a predictor—an algorithm that looks at the previously generated bits, tries to find all possible patterns in them and predict the most likely following bit—and then actually output the opposite. Keep doing this for every bit.
(To make the output fully deterministic, we add an extra rule that if the predictor assigns exactly 50% probabilities of the next bit being 0 or 1, we output 0.)
What would the generated sequence look like?
The first bit… we have no information yet; it could be anything. Therefore, the first bit is 0.
The second bit… all we know so far is that there was a 0 in the sequence, so it is more likely that the next bit will be a 0, too. Therefore, the second bit is 1.
Now we have 01; it could be an alternating sequence, so the next bit would be 0? Therefore the third bit is 1.
After 011… well, I am not sure; I think I could argue in both ways?
We need to make a more formal description of the algorithm, because the reasoning is going to get more complicated at each step, since we are trying to defy expectations.
The standard solution for the predictor would be to use the Solomonoff prior (take all possible algorithms that generate sequences of bits, and assign to each of them an initial probability 1/2^ the length of the algorithm), after each observed bit remove the algorithms that were falsified by the observed data, and then calculate the total probability of the remaining algorithms that predict that 0 would be next, versus the total probability of the remaining algorithms that predict that 1 would be next.
This solution is (1) uncomputable, and (2) depends on the exact details of the language used to write the algorithms. But, from the mathematical perspective (once we specify the language for writing the algorithms) it is a formal definition of the most unpredictable sequence.
Yes, it sounds ironic that we have a formal definition of something “unpredictable”, but the definition is uncomputable, which means that no algorithm can actually predict the sequence, we just have its mathematical definition.
So I guess we won’t find the following bits of the sequence. At least, not arbitrarily many. Perhaps we could still find two or three more, but each of them would require more complicated reasoning.
A simple version of this is the sequence 1,2,4,8,16,.… Each term is 1 greater than what you would predict by fitting a polynomial to the preceding terms.
The polynomial restriction is an interesting twist, are you asserting that the space of considered algorithms is ALWAYS smaller than the space of actual generative algorithms one might encounter? I think that’s a restatement of the uncomputability of the “standard solution”, but I have to think more.
The non-restricted proposed method (take all algorithms, inverse weighted by program length, and remove ones that get disproven by observed sequence) likely gets 32 for the next prediction (2^n), but considers 31 as only slightly less likely (regions of a circle created by the chords of n points on the circumference), and others as less likely still.
Yes, it seems that the generated sequence will always be outside the space of considered predictors. When the predictors are polynomial, the sequence can be exponential. When the predictors are algorithms in general, the sequence is uncomputable.
Would it be possible to make a space of all possible predictors? No, because the way the sequence is generated is… kinda like a probabilistic version of Cantor’s diagonal argument—only instead of contradicting the n-th prediction in n-th step, it tries to contradict the greatest mass of remaining predictions at each step. But that still means that each individual prediction will be contradicted at some moment, because if its prior probability is p, then at each step either this specific prediction is contradicted, or some other set of predictions with total probability at least p is contradicted, so we arrive at this specific prediction after at most 1/p steps.
Also, if we want to assign a nonzero prior probability to each prediction, we can only have a countable number of predictions. But there is an uncountable number of possible sequences. Therefore some of them (actually most of them) are outside the set of considered predictions.
I think the idea you’re looking for is Martin-Löf random sequences.
possibly an easier entry point to the topic is here
https://en.wikipedia.org/wiki/Chaitin%27s_constant
which is a specific construction that has some relation to the ideas OP has for a construction
You’ll be happy to know that standards bodies have noticed the “entropy reduction from excessive rules” problem. The latest version of NIST Special Publication 800-63B says to disallow four password categories like “already in a breach database” and “aaaaa,” but goes on to direct verifiers to not impose any other rules on password composition.
As for me, I just choose the first four digits of the busy beaver numbers--1621--as my PIN. As a noncomputable number, it’s guaranteed to be the most random choice possible.
If you have some limited computational model (e.g. n-th order Markov Chains), you can diagonalize over it in an expanded computational model (e.g. general computation) to get a sequence that is unpredictable in the narrower model.
Found an article by Scott Aaronson on a similar topic (except for the last part)
The Quest for Randomness
You might want to download the Online Encyclopedia of Integer Sequences, e.g., as in here, and play around with it, e.g., look at the least likely completion for a given sequence & so on.
i think a (simplicity-biased) predictor would narrow in on the situation described: that {the rule generating the sequence} contains {a copy of the predictor}, making them irresolvably mutually-dependent, similar to the mutual dependence in the classical halting problem.
in such a case, the predictor is not predicting a 1 or a 0, but a situation where neither can be yielded. so, to be a true implementation of said predictor, it would need to be able to output some third option representing irresolvable situations.
you’d get some string of bits before the predictor considered [irresolvable-mutual-dependance exception] most probable though! what that string is (for some prediction-narrowing procedure) sounds like a fun question