Projecting onto any 1-dimensional subspace orthogonal to this (there is a unique one through the origin) will thus yield a ‘variable’ which cleanly separates the two points into the red and blue categories. So in the illustrated example, it looks just like a problem of bad coordinate choice.
Thanks, this is a really important point! Indeed, for freely-reparametrizable abstract points in an abstract vector space, this is just a bad choice of coordinates. The reason this objection doesn’t make the post completely useless, is that for some applications (you know, if you’re one of those weird people who cares about “applications”), we do want to regard some bases as more “fundamental”, if the variables represent real-world measurements.
For example, you might be able to successfully classify two different species of flower using both “stem length” and “petal color” measurements, even if the distributions overlap for either stem length or petal color considered individually. Mathematically, we could view the distributions as not overlapping with respect to some variable that corresponds to some weighted function of stem length and petal color, but that variable seems “artificial”, less “interpretable.”
Another way to succinctly say this is that two distributions may be cleanly separable via a single immeasurable variable, but overlap when measured on any given measurable variable, such that a representation of the separation achieved by a single immeasurable variable is only achievable through multiple measurable variables.
The reason this objection doesn’t make the post completely useless...
Sorry, I hope I didn’t suggest I thought that! You make a good point about some variables being more natural in given applications. I think it’s good to keep in mind that sometimes it’s just a matter of coordinate choice, and other times the points may be separated but not in a linear way.
I mean, it doesn’t matter whether you think it, right? It matters whether it’s true. Like, if I were to were to write a completely useless blog post on account of failing to understand the concept of a change of basis, then someone should tell me, because that would be helping me stop being deceived about the quality of my blogging.
Thanks, this is a really important point! Indeed, for freely-reparametrizable abstract points in an abstract vector space, this is just a bad choice of coordinates. The reason this objection doesn’t make the post completely useless, is that for some applications (you know, if you’re one of those weird people who cares about “applications”), we do want to regard some bases as more “fundamental”, if the variables represent real-world measurements.
For example, you might be able to successfully classify two different species of flower using both “stem length” and “petal color” measurements, even if the distributions overlap for either stem length or petal color considered individually. Mathematically, we could view the distributions as not overlapping with respect to some variable that corresponds to some weighted function of stem length and petal color, but that variable seems “artificial”, less “interpretable.”
Another way to succinctly say this is that two distributions may be cleanly separable via a single immeasurable variable, but overlap when measured on any given measurable variable, such that a representation of the separation achieved by a single immeasurable variable is only achievable through multiple measurable variables.
Thanks for the reply, Zack.
Sorry, I hope I didn’t suggest I thought that! You make a good point about some variables being more natural in given applications. I think it’s good to keep in mind that sometimes it’s just a matter of coordinate choice, and other times the points may be separated but not in a linear way.
I mean, it doesn’t matter whether you think it, right? It matters whether it’s true. Like, if I were to were to write a completely useless blog post on account of failing to understand the concept of a change of basis, then someone should tell me, because that would be helping me stop being deceived about the quality of my blogging.