Nice. To make your proposed explanation more precise:
Take a random vector on the n-dim unit sphere. Project to the nearest (+1,-1)/sqrt(n) vector; what is the expected l2-distance / angle? How does it scale with n?
If this value decreases in n, then your explanation is essentially correct, or did you want to propose something else?
Start by taking a random vector x where each coordinate is unit gaussian (normalize later). The projection px just splits into positive coordinates and negative coordinates.
We are interested in E[ / |x| sqrt(n)].
If the dimension is large enough, then we wont really need to normalize; it is enough to start with 1/sqrt(n) gaussians, as we will almost almost surely get almost unit length. Then all components are independent.
For the angle, we then (approximately) need to compute E(sum_i |x_i| / n), where each x_i is unit Gaussian. This is asymptotically independent of n; so it appears like this explanation of improper linear models fails.
Darn, after reading your comment I mistakenly believed that this would be yet another case of “obvious from high-dimensional geometry” / random projection.
PS. In what sense are improper linear models working? l_1, l2, l\infty sense?
Edit: I was being stupid, leaving the above for future ridicule. We want E(sum_i |x_i| / n)=1, not E(sum_i |x_i|/n)=0.
Folded Gaussian tells us that E[ sum_i |x_i|/n]= sqrt(2/pi), for large n. The explanation still does not work, since 2/pi <1, and this gives us the expected error margin of improper high-dimensional models.
@Stuart: What are the typical empirical errors? Do they happen to be near sqrt(2/pi), which is close enough to 1 to be summarized as “kinda works”?
Nice. To make your proposed explanation more precise:
Take a random vector on the n-dim unit sphere. Project to the nearest (+1,-1)/sqrt(n) vector; what is the expected l2-distance / angle? How does it scale with n?
If this value decreases in n, then your explanation is essentially correct, or did you want to propose something else?
Start by taking a random vector x where each coordinate is unit gaussian (normalize later). The projection px just splits into positive coordinates and negative coordinates.
We are interested in E[ / |x| sqrt(n)].
If the dimension is large enough, then we wont really need to normalize; it is enough to start with 1/sqrt(n) gaussians, as we will almost almost surely get almost unit length. Then all components are independent.
For the angle, we then (approximately) need to compute E(sum_i |x_i| / n), where each x_i is unit Gaussian. This is asymptotically independent of n; so it appears like this explanation of improper linear models fails.
Darn, after reading your comment I mistakenly believed that this would be yet another case of “obvious from high-dimensional geometry” / random projection.
PS. In what sense are improper linear models working? l_1, l2, l\infty sense?
Edit: I was being stupid, leaving the above for future ridicule. We want E(sum_i |x_i| / n)=1, not E(sum_i |x_i|/n)=0.
Folded Gaussian tells us that E[ sum_i |x_i|/n]= sqrt(2/pi), for large n. The explanation still does not work, since 2/pi <1, and this gives us the expected error margin of improper high-dimensional models.
@Stuart: What are the typical empirical errors? Do they happen to be near sqrt(2/pi), which is close enough to 1 to be summarized as “kinda works”?