I have a random mathematical idea, not sure what it means, whether it is somehow useful, or whether anyone has explored this before. So I guess I’ll just write it here.
Imagine the most unexpected sequence of bits. What would it look like? Well, probably not what you’d expect, by definition, right? But let’s be more specific.
By “expecting” I mean this: You have a prediction machine, similar to AIXI. You show the first N bits of the sequence to the machine, and the machine tries to predict the following bit. And the most unexpected sequence is one where the machine makes the most guesses wrong; preferably all of them.
More precisely: The prediction machine starts with imagining all possible algorithms that could generate sequences of bits, and it assigns them probability according to the Solomonoff prior. (Which is impossible to do in real life, because of the infinities involved, etc.) Then it receives the first N bits of the sequence, and removes all algorithms which would not generate a sequence starting with these N bits. Now it normalizes the probabilities of the remaining algorithms, and lets them vote on whether the next bit would be 0 or 1.
However, our sequence is generated in defiance to the prediction machine. We actualy don’t have any sequence in advance. We just ask the prediction machine what is the next bit (starting with the empty initial sequence), and then do the exact opposite. (There is some analogy with Cantor’s diagonal proof.) Then we send the sequence with this new bit to the machine, ask it to predict the next bit, and again do the opposite. Etc.
There is this technical detail, that the prediction machine may answer “I don’t know” if exactly half of the remaining algorithms predict that the next bit will be 0, and other half predicts that it will be 1. Let’s say that if we receive this specific answer, we will always add 0 to the end of the sequence. (But if the machine thinks it’s 0 with probability 50.000001%, and 1 with probability 49.999999%, it will output “0″, and we will add 1 to the end of the sequence.)
So… at the beginning, there is no way to predict the first bit, so the machine says “I don’t know” and the first bit is 0. At that moment, the prediction of the following bit is 0 (because the “only 0′s” hypothesis is very simple), so the first two bits are 01. I am not sure here, but my next prediction (though I am predicting this with naive human reasoning, no math) would be 0 (as in “010101...”), so the first three bits are 011. -- And I don’t dare to speculate about the following bits.
The exact sequence depends on how exactly the prediction machine defines the “algorithms that generate the sequence of bits” (the technical details of the language these algorithms are written in), but can still something be said about these “most unexpected” sequences in general? My guess is that to a human observer they would seem like a random noise. -- Which contradicts my initial words that the sequence would not be what you’d expect… but I guess the answer is that the generation process is trying to surprise the prediction machine, not me as a human.
In order to capture your intuition that a random sequence is “unsurprising”, you want the predictor to output a distribution over {0,1} — or equivalently, a subjective probability p of the next bit being 1. The predictor tries to maximize the expectation of a proper scoring rule. In that case, the maximally unexpected sequence will be random, and the probability of the sequence will approach 2^{-n}.
Allowing the predictor to output {0, 1, ?} is kind of like restricting its outputs to {0%, 50%, 100%}.
In a random sequence, AIXI would guess on average half of the bits. My goal was to create a specific sequence, where it couldn’t guess any. Not just a random sequence, but specifically… uhm… “anti-inductive”? The exact opposite of lawful, where random is merely halfway opposed. I don’t care about other possible predictors, only about AIXI.
Imagine playing rock-paper-scissors against someone who beats you all the time, whatever you do. That’s worse than random. This sequence would bring the mighty AIXI to tears… but I suspect to a human observer it would merely seem pseudo-random. And is probably not very useful for other goals than making fun of AIXI.
Ok. I still think the sequence is random in the algorithmic information theory sense; i.e., it’s incompressible. But I understand you’re interested in the adversarial aspect of the scenario.
You only need a halting oracle to compute your adversarial sequence (because that’s what it takes to run AIXI). A super-Solomonoff inductor that inducts over all Turing machines with access to halting oracles would be able to learn the sequence, I think. The adversarial sequence for that inductor would require a higher oracle to compute, and so on up the ordinal hierarchy.
My guess is that to a human observer they would seem like a random noise. -- Which contradicts my initial words that the sequence would not be what you’d expect… but I guess the answer is that the generation process is trying to surprise the prediction machine, not me as a human.
“What is the specific pattern of bits?” and “Give a vague description that applies to both this pattern and asymptotically 100% of possible patterns of bits” are very different questions. You’re asking the machine the first question and the human the second question, so I’m not surprised the answers are different.
Does “most unexpected” differ from “least predictable” in any way? Seems like a random number generator would match any algorithm around 50% of the time so making an algorithm less predictable than that is impossible no?
My prediction machine can maximize it’s expected minimum score by outputting random guesses. Then your bitstring is precisely the complement of my random string, and therefore drawn from the random distribution.
I have briefly thought about this idea in the context of password selection and password crackers: the “most unexpected” string (of some maximum length) is a good password. No deep reasoning here though.
I think adding a little meta-probability will help.
Since there’s some probability of the sequence being “the most surprising”, this would basically mean that several of the most surprising end up with basically the same probability. For example, if it takes n bits of data to define “the most surprising m-bit sequence”, then there must be a 2^-n chance of that happening. Since there are 2^m sequences, and the most surprising sequence must have a probability of at most 2^-m, there must be at least 2^(m-n) most surprising sequences.
I have a random mathematical idea, not sure what it means, whether it is somehow useful, or whether anyone has explored this before. So I guess I’ll just write it here.
Imagine the most unexpected sequence of bits. What would it look like? Well, probably not what you’d expect, by definition, right? But let’s be more specific.
By “expecting” I mean this: You have a prediction machine, similar to AIXI. You show the first N bits of the sequence to the machine, and the machine tries to predict the following bit. And the most unexpected sequence is one where the machine makes the most guesses wrong; preferably all of them.
More precisely: The prediction machine starts with imagining all possible algorithms that could generate sequences of bits, and it assigns them probability according to the Solomonoff prior. (Which is impossible to do in real life, because of the infinities involved, etc.) Then it receives the first N bits of the sequence, and removes all algorithms which would not generate a sequence starting with these N bits. Now it normalizes the probabilities of the remaining algorithms, and lets them vote on whether the next bit would be 0 or 1.
However, our sequence is generated in defiance to the prediction machine. We actualy don’t have any sequence in advance. We just ask the prediction machine what is the next bit (starting with the empty initial sequence), and then do the exact opposite. (There is some analogy with Cantor’s diagonal proof.) Then we send the sequence with this new bit to the machine, ask it to predict the next bit, and again do the opposite. Etc.
There is this technical detail, that the prediction machine may answer “I don’t know” if exactly half of the remaining algorithms predict that the next bit will be 0, and other half predicts that it will be 1. Let’s say that if we receive this specific answer, we will always add 0 to the end of the sequence. (But if the machine thinks it’s 0 with probability 50.000001%, and 1 with probability 49.999999%, it will output “0″, and we will add 1 to the end of the sequence.)
So… at the beginning, there is no way to predict the first bit, so the machine says “I don’t know” and the first bit is 0. At that moment, the prediction of the following bit is 0 (because the “only 0′s” hypothesis is very simple), so the first two bits are 01. I am not sure here, but my next prediction (though I am predicting this with naive human reasoning, no math) would be 0 (as in “010101...”), so the first three bits are 011. -- And I don’t dare to speculate about the following bits.
The exact sequence depends on how exactly the prediction machine defines the “algorithms that generate the sequence of bits” (the technical details of the language these algorithms are written in), but can still something be said about these “most unexpected” sequences in general? My guess is that to a human observer they would seem like a random noise. -- Which contradicts my initial words that the sequence would not be what you’d expect… but I guess the answer is that the generation process is trying to surprise the prediction machine, not me as a human.
In order to capture your intuition that a random sequence is “unsurprising”, you want the predictor to output a distribution over {0,1} — or equivalently, a subjective probability p of the next bit being 1. The predictor tries to maximize the expectation of a proper scoring rule. In that case, the maximally unexpected sequence will be random, and the probability of the sequence will approach 2^{-n}.
Allowing the predictor to output {0, 1, ?} is kind of like restricting its outputs to {0%, 50%, 100%}.
In a random sequence, AIXI would guess on average half of the bits. My goal was to create a specific sequence, where it couldn’t guess any. Not just a random sequence, but specifically… uhm… “anti-inductive”? The exact opposite of lawful, where random is merely halfway opposed. I don’t care about other possible predictors, only about AIXI.
Imagine playing rock-paper-scissors against someone who beats you all the time, whatever you do. That’s worse than random. This sequence would bring the mighty AIXI to tears… but I suspect to a human observer it would merely seem pseudo-random. And is probably not very useful for other goals than making fun of AIXI.
Ok. I still think the sequence is random in the algorithmic information theory sense; i.e., it’s incompressible. But I understand you’re interested in the adversarial aspect of the scenario.
You only need a halting oracle to compute your adversarial sequence (because that’s what it takes to run AIXI). A super-Solomonoff inductor that inducts over all Turing machines with access to halting oracles would be able to learn the sequence, I think. The adversarial sequence for that inductor would require a higher oracle to compute, and so on up the ordinal hierarchy.
Shouldn’t AIXI include itself (for all inputs) recursively? If so I don’t think your sequence is well defined.
No, AIXI isn’t computable and so does not include itself as a hypothesis.
Oh, I see.
Just in case anyone wants pointers to existing mathematical work on “unpredictable” sequences: Algorithmically random sequences (wikipedia)
“What is the specific pattern of bits?” and “Give a vague description that applies to both this pattern and asymptotically 100% of possible patterns of bits” are very different questions. You’re asking the machine the first question and the human the second question, so I’m not surprised the answers are different.
Does “most unexpected” differ from “least predictable” in any way? Seems like a random number generator would match any algorithm around 50% of the time so making an algorithm less predictable than that is impossible no?
My prediction machine can maximize it’s expected minimum score by outputting random guesses. Then your bitstring is precisely the complement of my random string, and therefore drawn from the random distribution.
I have briefly thought about this idea in the context of password selection and password crackers: the “most unexpected” string (of some maximum length) is a good password. No deep reasoning here though.
I think adding a little meta-probability will help.
Since there’s some probability of the sequence being “the most surprising”, this would basically mean that several of the most surprising end up with basically the same probability. For example, if it takes n bits of data to define “the most surprising m-bit sequence”, then there must be a 2^-n chance of that happening. Since there are 2^m sequences, and the most surprising sequence must have a probability of at most 2^-m, there must be at least 2^(m-n) most surprising sequences.