Huh? The reference set Ω is the set of possible world histories, out of which one element is the actual world history. I don’t see what’s wrong with this.
janos
Nope; it’s the limit of I(J(I(J(I(J(I(J(...(w)...), where I(S) for a set S is the union of the elements of I that have nonempty intersections with S, i.e. the union of I(x) over all x in S, and J(S) is defined the same way.
Alternately if instead of I and J you think about the sigma-algebras they generate (let’s call them sigma(I) and sigma(J)), then sigma(I meet J) is the intersection of sigma(I) and sigma(J). I prefer this somewhat because the machinery for conditional expectation is usually defined in terms of sigma-algebras, not partitions.
- Dec 11, 2009, 3:41 PM; 1 point) 's comment on Probability Space & Aumann Agreement by (
Right, that is a good piece. But I’m afraid I was unclear. (Sorry if I was.) I’m looking for a prior over stationary sequences of digits, not just sequences. I guess the adjective “stationary” can be interpreted in two compatible ways: either I’m talking about sequences such that for every possible string w the proportion of substrings of length |w| that are equal to |w|, among all substrings of length |w|, tends to a limit as you consider more and more substrings (either extending forward or backward in the sequence); this would not quite be a prior over generators, and isn’t what I meant.
The cleaner thing I could have meant (and did) is the collection of stationary sequence-valued random variables, each of which (up to isomorphism) is completely described by the probabilities p_w of a string of length |w| coming up as w. These, then, are generators.
Each element of the set is characterized by a bunch of probabilities; for example there is p_01101, which is the probability that elements x_{i+1} through x_{i+5} are 01101, for any i. I was thinking of using the topology induced by these maps (i.e. generated by preimages of open sets under them).
How is putting a noninformative prior on the reals hard? With the usual required invariance, the uniform (improper) prior does the job. I don’t mind having the prior be improper here either, and as I said I don’t know what invariance I should want; I can’t think of many interesting group actions that apply. Though of course 0 and 1 should be treated symmetrically; but that’s trivial to arrange.
I guess you’re right that regularities can be described more generally with computational models; but I expect them to be harder to deal with than this (relatively) simple, noncomputational (though stochastic) model. I’m not looking for regularities among the models, so I’m not sure how a computational model would help me.
The purpose would be to predict regularities in a “language”, e.g. to try to achieve decent data compression in a way similar to other Markov-chain-based approaches. In terms of properties, I can’t think of any nontrivial ones, except the usual important one that the prior assign nonzero probability to every open set; mainly I’m just trying to find something that I can imagine computing with.
It’s true that there exists a bijection between this space and the real numbers, but it doesn’t seem like a very natural one, though it does work (it’s measurable, etc). I’ll have to think about that one.
Since we’re discussing (among other things) noninformative priors, I’d like to ask: does anyone know of a decent (noninformative) prior for the space of stationary, bidirectionally infinite sequences of 0s and 1s?
Of course in any practical inference problem it would be pointless to consider the infinite joint distribution, and you’d only need to consider what happens for a finite chunk of bits, i.e. a higher-order Markov process, described by a bunch of parameters (probabilities) which would need to satisfy some linear inequalities. So it’s easy to find a prior for the space of mth-order Markov processes on {0,1}; but these obvious (uniform) priors aren’t coherent with each other.
I suppose it’s possible to normalize these priors so that they’re coherent, but that seems to result in much ugliness. I just wonder if there’s a more elegant solution.
Updated, eh? Where did your prior come from? :)
I am trying to understand the examples on that page, but they seem strange; shouldn’t there be a model with parameters, and a prior distribution for those parameters? I don’t understand the inferences. Can someone explain?
I think you’re confusing the act of receiving information/understanding about an experience with the experience itself.
Re: the joke example, I think that one would get tired of hearing a joke too many times, and that’s what the dissection is equivalent to, because you keep hearing it in your head; but if you already get the joke, the dissection is not really adding to your understanding. If you didn’t get the joke, you will probably receive a twinge of enjoyment at the moment when you finally do understand. If you don’t understand a joke, I don’t think you can get warm fuzzies from it.
With hormones, again I think that being explicitly reminded of the role of hormones in physical attraction while experiencing physical attraction reduces warm fuzzies only because it’s distracting you from the source of the warm fuzzies and making you feel self-conscious. On the other hand, knowing more about the role of hormones should not generally distract you from your physical attraction; instead you could use it to tada get more warm fuzzies.
Interesting. My internal experience of programming is quite different; I don’t see boxes and lines. Data structures for me are more like people who answer questions, although of course with no personality or voice; the voice is mine as I ask them a question, and they respond in a “written” form, i.e. with a silent indication. So the diagrams people like to draw for databases and such don’t make direct sense to me per se; they’re just a way of organizing written information.
I am finding it quite difficult to coherently and correctly describe such things; no part of this do I have any certainty of, except that I know I don’t imagine black-and-white box diagrams.
Do you have some good examples of abuse of Bayes’ theorem?
Bayes’ Theorem never returns “undefined”. In the absence of any evidence it returns the prior.
At least not when they’re already in the bucket.
I’m mostly a lurker, but I’m in Toronto.
That simplification is a situation in which there is no common knowledge. In world-state w, agent 1 knows A1 (meaning knows that the correct world is in A1), and agent 2 knows A2. They both know A1 union A2, but that’s still not common knowledge, because agent 1 doesn’t know that agent 2 knows A1 union A2.
I(w) is what agent 1 knows, if w is correct. If all you know is S, then the only thing you know agent 1 knows is I(S), and the only thing that you know agent 1 knows agent 2 knows is J(I(S)), and so forth. This is why the usual “everyone knows that everyone knows that … ” definition of common knowledge translates to I(J(I(J(I(J(...(w)...).