Thanks. I think I had the law of large numbers and CLT in the same bucket in my head, so pointing out they’re different is helpful. Your point #5, and the attractor bit, are especially interesting—and I’ve seen similar arguments in Jaynes’s book, around gaussians, so this is starting to get into places I can relate to. And knowing that convergence in distribution is called weak convergence should help when I’m searching for stuff. Helpful!
CLT applies to a family of random variables, not to distributions.
I guess I consider a family of random variables to be the same thing as a family of distributions? Is there a difference?
Answering the last question: If you deal with any random variable, formally you are specifying a probability space, and the variable is a measurable function on it. So, to say anything useful about a family of random variables, they all have to live on the same space (otherwise you can’t—for example—add them. It does not make sense to add functions defined on different spaces). This shared probability space can be very complicated by itself, even though the marginal distributions are the same—it encodes the (non-)independence among them (in case of independent variables, it’s just a product space with a product measure).
Your comment made me realize that I didn’t actually know what it meant to add random variables! I looked it up and found that, according to Wikipedia, this corresponds (if the RVs are independent) to what my main source (Jaynes) has been talking about in terms of convolutions of probability distributions. So I’m gonna go back and re-read the parts on convolution.
But I still want to go out on a limb here and say that
So, to say anything useful about a family of random variables, they all have to live on the same space
sounds to me like too strong a statement. Since I can take the AND of just about any two propositions and get a probability, can’t I talk about the chance of a person being 6 feet tall, and about the probability that it is raining in Los Angeles today, even though those event spaces are really different, and therefore their probability spaces are different? And if I can do that, what is special about the addition of random variables that makes it not applicable, in the way AND is applicable?
If you don’t have a given joint pobability space, you implicitly construct it (for example, by saying RV are independent, you implicitly construct a product space). Generally, the fact that sometimes you talk about X living on one space (on its own) and other time on the other (joint with some Y) doesn’t really matter, because in most situations, probability theory is specifically about the properties of random variables that are independent of the of the underlying spaces (although sometimes it does matter).
Your example, by definition, P = Prob(X = 6ft AND Y = raining) = mu{t: X(t) = 6ft and Y(t) = raining}. You have to assume their joint probability space. For example, maybe they are independent, and then it P = Prob(X = 6ft) \*Prob(Y = raining), or maybe it’s Y = if X = 6ft than raining else not raining, and then P = Prob(X = 6ft).
Thanks. I think I had the law of large numbers and CLT in the same bucket in my head, so pointing out they’re different is helpful. Your point #5, and the attractor bit, are especially interesting—and I’ve seen similar arguments in Jaynes’s book, around gaussians, so this is starting to get into places I can relate to. And knowing that convergence in distribution is called weak convergence should help when I’m searching for stuff. Helpful!
I guess I consider a family of random variables to be the same thing as a family of distributions? Is there a difference?
Answering the last question: If you deal with any random variable, formally you are specifying a probability space, and the variable is a measurable function on it. So, to say anything useful about a family of random variables, they all have to live on the same space (otherwise you can’t—for example—add them. It does not make sense to add functions defined on different spaces). This shared probability space can be very complicated by itself, even though the marginal distributions are the same—it encodes the (non-)independence among them (in case of independent variables, it’s just a product space with a product measure).
Your comment made me realize that I didn’t actually know what it meant to add random variables! I looked it up and found that, according to Wikipedia, this corresponds (if the RVs are independent) to what my main source (Jaynes) has been talking about in terms of convolutions of probability distributions. So I’m gonna go back and re-read the parts on convolution.
But I still want to go out on a limb here and say that
sounds to me like too strong a statement. Since I can take the AND of just about any two propositions and get a probability, can’t I talk about the chance of a person being 6 feet tall, and about the probability that it is raining in Los Angeles today, even though those event spaces are really different, and therefore their probability spaces are different? And if I can do that, what is special about the addition of random variables that makes it not applicable, in the way AND is applicable?
If you don’t have a given joint pobability space, you implicitly construct it (for example, by saying RV are independent, you implicitly construct a product space). Generally, the fact that sometimes you talk about X living on one space (on its own) and other time on the other (joint with some Y) doesn’t really matter, because in most situations, probability theory is specifically about the properties of random variables that are independent of the of the underlying spaces (although sometimes it does matter).
Your example, by definition, P = Prob(X = 6ft AND Y = raining) = mu{t: X(t) = 6ft and Y(t) = raining}. You have to assume their joint probability space. For example, maybe they are independent, and then it P = Prob(X = 6ft) \* Prob(Y = raining), or maybe it’s Y = if X = 6ft than raining else not raining, and then P = Prob(X = 6ft).