I’m (re-)reading up on absolutely continuous probability spaces right now. The defintion for the expected value I find everywhere is this:
(1): E(X):=∫∞−∞x⋅f(x)dx
The way to interpret this formula is that we’re integrating over the target space of X rather than the domain, and f is a probability density function over the target space of X. But this formula seems highly confusing if that is left unsaid (X doesn’t even appear in it – what the heck?). If one begins with a probability density function f over a probability space Ω=R and then wants to compute the expected value of a random variable X:Ω→R, I think the formula is:
(2): E(X):=∫∞−∞X(ω)⋅f(ω)dω
It seems utterly daft to me to present (1) without first presenting (2) if the idea is to teach the material in an easily understandable way. Even if one never uses (2) in practice. But this is what seems to be done everywhere – I googled a bunch, checked wikipedia, and dug out an old script, I haven’t found (2) anywhere (I hope it’s even correct). Worse, none of them even explicitly mention that x ranges over X(Ω) rather than over Ω after presenting (1). I get each random variable does itself define a probability space where the distribution is automatically over X(Ω) but I don’t think this is a good argument not to present (2). This concept is obviously not going to be trivial to understand.
Stuff like this makes me feel like almost no-one thinks for themselves unless they have to, even in math. I’m interested in whether or not fellow LW-ians share my intuition here.
There seems to be a similar thing going on in linear algebra, where everyone teaches concepts based on the determinant, even though doing it differently makes them far easier. But there it feels more understandable, since you do need to be quite good to see that. This case here just feels like people aren’t even trying to optimize for readability.
When I learned probability, we were basically presented with a random variable X, told that it could occupy a bunch of different values, and asked to calculate what the average/expected value is based on the frequencies of what those different values could be. So you start with a question like “we roll a die. here are all the values it could be and they all happen one-sixth of the time. Add each value multiplied by one-sixth to each other to get the expected value.” This framing naturally leads to definition (1) when you expand to continuous random variables.
On one hand, this makes definition (1) really intuitive and easy to learn. After all, if you frame the questions around the target space, you’ll frame your understanding around the target space. Frankly, when I read your comment, my immediate reaction was “what on earth is a probability space? we’re just summing up the ways the target variable can happen and claiming that its a map from some other space to the target variable is just excessive!” When you’re taught about target space, you don’t think about probability space.
On the other hand, defintiion (2) is really useful in a lot of (usually more niche) areas. If you don’t contextualize X as a map between a space of possible outcomes as a real number, things like integrals using Maxwell Boltzmann statistics won’t make any sense. To someone who does, you’re just adding up all the possibilities weighted by a given value.
When I learned probability, we were basically presented with a random variable X, told that it could occupy a bunch of different values, and asked to calculate what the average/expected value is based on the frequencies of what those different values could be. So you start with a question like “we roll a die. here are all the values it could be and they all happen one-sixth of the time. Add each value multiplied by one-sixth to each other to get the expected value.” This framing naturally leads to definition (1) when you expand to continuous random variables.
That’s a strong steelman of the status quo in cases where random variables are introduced as you describe. I’ll concede that (1) is fine in this case. I’m not sure it applies to cases (lectures) where probability spaces are formally introduced – but maybe it does; maybe other people still don’t think of RVs as functions, even if that’s what they technically are.
I’m (re-)reading up on absolutely continuous probability spaces right now. The defintion for the expected value I find everywhere is this:
(1): E(X):=∫∞−∞x⋅f(x)dx
The way to interpret this formula is that we’re integrating over the target space of X rather than the domain, and f is a probability density function over the target space of X. But this formula seems highly confusing if that is left unsaid (X doesn’t even appear in it – what the heck?). If one begins with a probability density function f over a probability space Ω=R and then wants to compute the expected value of a random variable X:Ω→R, I think the formula is:
(2): E(X):=∫∞−∞X(ω)⋅f(ω)dω
It seems utterly daft to me to present (1) without first presenting (2) if the idea is to teach the material in an easily understandable way. Even if one never uses (2) in practice. But this is what seems to be done everywhere – I googled a bunch, checked wikipedia, and dug out an old script, I haven’t found (2) anywhere (I hope it’s even correct). Worse, none of them even explicitly mention that x ranges over X(Ω) rather than over Ω after presenting (1). I get each random variable does itself define a probability space where the distribution is automatically over X(Ω) but I don’t think this is a good argument not to present (2). This concept is obviously not going to be trivial to understand.
Stuff like this makes me feel like almost no-one thinks for themselves unless they have to, even in math. I’m interested in whether or not fellow LW-ians share my intuition here.
There seems to be a similar thing going on in linear algebra, where everyone teaches concepts based on the determinant, even though doing it differently makes them far easier. But there it feels more understandable, since you do need to be quite good to see that. This case here just feels like people aren’t even trying to optimize for readability.
When I learned probability, we were basically presented with a random variable X, told that it could occupy a bunch of different values, and asked to calculate what the average/expected value is based on the frequencies of what those different values could be. So you start with a question like “we roll a die. here are all the values it could be and they all happen one-sixth of the time. Add each value multiplied by one-sixth to each other to get the expected value.” This framing naturally leads to definition (1) when you expand to continuous random variables.
On one hand, this makes definition (1) really intuitive and easy to learn. After all, if you frame the questions around the target space, you’ll frame your understanding around the target space. Frankly, when I read your comment, my immediate reaction was “what on earth is a probability space? we’re just summing up the ways the target variable can happen and claiming that its a map from some other space to the target variable is just excessive!” When you’re taught about target space, you don’t think about probability space.
On the other hand, defintiion (2) is really useful in a lot of (usually more niche) areas. If you don’t contextualize X as a map between a space of possible outcomes as a real number, things like integrals using Maxwell Boltzmann statistics won’t make any sense. To someone who does, you’re just adding up all the possibilities weighted by a given value.
That’s a strong steelman of the status quo in cases where random variables are introduced as you describe. I’ll concede that (1) is fine in this case. I’m not sure it applies to cases (lectures) where probability spaces are formally introduced – but maybe it does; maybe other people still don’t think of RVs as functions, even if that’s what they technically are.