Can you clarify something? In the picture you draw, there is a codimension-1 linear subspace separating the parameter space into two halves, with all red points to one side, and all blue points to the other. Projecting onto any 1-dimensional subspace orthogonal to this (there is a unique one through the origin) will thus yield a `variable’ which cleanly separates the two points into the red and blue categories. So in the illustrated example, it looks just like a problem of bad coordinate choice.
On the other hand, one can easily have much more pathological situations; for examples, the red points could all lie inside a certain sphere, and the blue points outside it. Then no choice of linear coordinates will illustrate this, and one has to use more advanced analysis techniques to pick up on it (e.g. persistent homology).
So, to my vague question: do you have only the first situation in mind, or are you also considering the general case, but made the illustrated example extra-simple?
Perhaps this is clarified by your numerical example, I’m afraid I’ve not checked.
Projecting onto any 1-dimensional subspace orthogonal to this (there is a unique one through the origin) will thus yield a ‘variable’ which cleanly separates the two points into the red and blue categories. So in the illustrated example, it looks just like a problem of bad coordinate choice.
Thanks, this is a really important point! Indeed, for freely-reparametrizable abstract points in an abstract vector space, this is just a bad choice of coordinates. The reason this objection doesn’t make the post completely useless, is that for some applications (you know, if you’re one of those weird people who cares about “applications”), we do want to regard some bases as more “fundamental”, if the variables represent real-world measurements.
For example, you might be able to successfully classify two different species of flower using both “stem length” and “petal color” measurements, even if the distributions overlap for either stem length or petal color considered individually. Mathematically, we could view the distributions as not overlapping with respect to some variable that corresponds to some weighted function of stem length and petal color, but that variable seems “artificial”, less “interpretable.”
Another way to succinctly say this is that two distributions may be cleanly separable via a single immeasurable variable, but overlap when measured on any given measurable variable, such that a representation of the separation achieved by a single immeasurable variable is only achievable through multiple measurable variables.
The reason this objection doesn’t make the post completely useless...
Sorry, I hope I didn’t suggest I thought that! You make a good point about some variables being more natural in given applications. I think it’s good to keep in mind that sometimes it’s just a matter of coordinate choice, and other times the points may be separated but not in a linear way.
I mean, it doesn’t matter whether you think it, right? It matters whether it’s true. Like, if I were to were to write a completely useless blog post on account of failing to understand the concept of a change of basis, then someone should tell me, because that would be helping me stop being deceived about the quality of my blogging.
Hi Zack,
Can you clarify something? In the picture you draw, there is a codimension-1 linear subspace separating the parameter space into two halves, with all red points to one side, and all blue points to the other. Projecting onto any 1-dimensional subspace orthogonal to this (there is a unique one through the origin) will thus yield a `variable’ which cleanly separates the two points into the red and blue categories. So in the illustrated example, it looks just like a problem of bad coordinate choice.
On the other hand, one can easily have much more pathological situations; for examples, the red points could all lie inside a certain sphere, and the blue points outside it. Then no choice of linear coordinates will illustrate this, and one has to use more advanced analysis techniques to pick up on it (e.g. persistent homology).
So, to my vague question: do you have only the first situation in mind, or are you also considering the general case, but made the illustrated example extra-simple?
Perhaps this is clarified by your numerical example, I’m afraid I’ve not checked.
Thanks, this is a really important point! Indeed, for freely-reparametrizable abstract points in an abstract vector space, this is just a bad choice of coordinates. The reason this objection doesn’t make the post completely useless, is that for some applications (you know, if you’re one of those weird people who cares about “applications”), we do want to regard some bases as more “fundamental”, if the variables represent real-world measurements.
For example, you might be able to successfully classify two different species of flower using both “stem length” and “petal color” measurements, even if the distributions overlap for either stem length or petal color considered individually. Mathematically, we could view the distributions as not overlapping with respect to some variable that corresponds to some weighted function of stem length and petal color, but that variable seems “artificial”, less “interpretable.”
Another way to succinctly say this is that two distributions may be cleanly separable via a single immeasurable variable, but overlap when measured on any given measurable variable, such that a representation of the separation achieved by a single immeasurable variable is only achievable through multiple measurable variables.
Thanks for the reply, Zack.
Sorry, I hope I didn’t suggest I thought that! You make a good point about some variables being more natural in given applications. I think it’s good to keep in mind that sometimes it’s just a matter of coordinate choice, and other times the points may be separated but not in a linear way.
I mean, it doesn’t matter whether you think it, right? It matters whether it’s true. Like, if I were to were to write a completely useless blog post on account of failing to understand the concept of a change of basis, then someone should tell me, because that would be helping me stop being deceived about the quality of my blogging.