gathaung comments on The robust beauty of improper linear models

gathaung 17 May 2017 9:43 UTC
0 points
Nice. To make your proposed explanation more precise:

Take a random vector on the n-dim unit sphere. Project to the nearest (+1,-1)/sqrt(n) vector; what is the expected l2-distance / angle? How does it scale with n?

If this value decreases in n, then your explanation is essentially correct, or did you want to propose something else?

Start by taking a random vector x where each coordinate is unit gaussian (normalize later). The projection px just splits into positive coordinates and negative coordinates.

We are interested in E[ / |x| sqrt(n)].

If the dimension is large enough, then we wont really need to normalize; it is enough to start with 1/sqrt(n) gaussians, as we will almost almost surely get almost unit length. Then all components are independent.

For the angle, we then (approximately) need to compute E(sum_i |x_i| / n), where each x_i is unit Gaussian. This is asymptotically independent of n; so it appears like this explanation of improper linear models fails.

Darn, after reading your comment I mistakenly believed that this would be yet another case of “obvious from high-dimensional geometry” / random projection.

PS. In what sense are improper linear models working? l_1, l2, l\infty sense?

Edit: I was being stupid, leaving the above for future ridicule. We want E(sum_i |x_i| / n)=1, not E(sum_i |x_i|/n)=0.

Folded Gaussian tells us that E[ sum_i |x_i|/n]= sqrt(2/pi), for large n. The explanation still does not work, since 2/pi <1, and this gives us the expected error margin of improper high-dimensional models.

@Stuart: What are the typical empirical errors? Do they happen to be near sqrt(2/pi), which is close enough to 1 to be summarized as “kinda works”?