Something about this discussion reminds me of a hilarious text:
Now having no reason to otherwise, I decided to assign each of the 64 sequences a prior probability of 1⁄64 of occurring. Now, of course, You may think otherwise but that is Your business and not My concern. (I, as a Bayesian, have a tendency to capitalise pronouns but I don’t care what You think. Strictly speaking, as a new convert to subjectivist philosophy, I don’t even care whether you are a Bayesian. In fact it is a bit of mystery as to why we Bayesians want to convert anybody. But then “We” is in any case a meaningless concept. There is only I and I don’t care whether this digression has confused You.) I then set about acquiring some experience with the coin. Now as De Finetti (vol 1 p141) points out, “experience, since experience is nothing more than the acquisition of further information—acts always and only in the way we have just described: suppressing the alternatives that turn out to be no longer possible...” (His italics)
Now of the 64 sequences, 32 end in a head. Therefore, before tossing the coin my prevision of the 6th toss was 32⁄64. I tossed the coin once and it came up heads. I thus immediately suppressed 32 alternative sequences beginning with a tail (which clearly hadn’t occurred) leaving 32 beginning with a head of which 16 ended with a head. Thus my prevision for the 6th toss was now 16⁄32. (Of course, for a single toss the number of heads can only be 0 or 1 but THINK prevision is not prediction anymore than perversion is predilection.) I then tossed the coin and it came up heads. This immediately eliminated 16 sequences, leaving 16 beginning with 2 heads, 8 of which ended in a head. My prevision of the 6th toss was thus 8⁄16. I carried on like this, obtaining a head on each of the next three goes and amending my prevision to 4⁄8, 2⁄4 and 1⁄2 which is where I then was after the 5th toss having obtained 5 heads in a row.
The moral of this story seems to be, Assume priors over generators, not over sequences. A noninformative prior over the reals will never learn that the digit after 0100 is more likely to be 1, no matter how much data you feed it.
Right, that is a good piece. But I’m afraid I was unclear. (Sorry if I was.) I’m looking for a prior over stationary sequences of digits, not just sequences. I guess the adjective “stationary” can be interpreted in two compatible ways: either I’m talking about sequences such that for every possible string w the proportion of substrings of length |w| that are equal to |w|, among all substrings of length |w|, tends to a limit as you consider more and more substrings (either extending forward or backward in the sequence); this would not quite be a prior over generators, and isn’t what I meant.
The cleaner thing I could have meant (and did) is the collection of stationary sequence-valued random variables, each of which (up to isomorphism) is completely described by the probabilities p_w of a string of length |w| coming up as w. These, then, are generators.
Janos, I spent some days parsing your request and it’s quite complex. Cosma Shalizi’s thesis and algorithm seem to address your problem in a frequentist manner, but I can’t yet work out any good Bayesian solution.
Something about this discussion reminds me of a hilarious text:
The moral of this story seems to be, Assume priors over generators, not over sequences. A noninformative prior over the reals will never learn that the digit after 0100 is more likely to be 1, no matter how much data you feed it.
Right, that is a good piece. But I’m afraid I was unclear. (Sorry if I was.) I’m looking for a prior over stationary sequences of digits, not just sequences. I guess the adjective “stationary” can be interpreted in two compatible ways: either I’m talking about sequences such that for every possible string w the proportion of substrings of length |w| that are equal to |w|, among all substrings of length |w|, tends to a limit as you consider more and more substrings (either extending forward or backward in the sequence); this would not quite be a prior over generators, and isn’t what I meant.
The cleaner thing I could have meant (and did) is the collection of stationary sequence-valued random variables, each of which (up to isomorphism) is completely described by the probabilities p_w of a string of length |w| coming up as w. These, then, are generators.
Janos, I spent some days parsing your request and it’s quite complex. Cosma Shalizi’s thesis and algorithm seem to address your problem in a frequentist manner, but I can’t yet work out any good Bayesian solution.