But this does not hold for tiny cosine similarities (e.g. 0.01 for n=12288, which gives a lower bound of 2 using the formula above). I’m not aware of a lower bound better than n for tiny angles.
Unless I’m misunderstanding, a better lower bound for almost orthogonal vectors when cosine similarity is approximately 0 is just n, by taking an orthogonal basis for the space.
My guess for why the formula doesn’t give this is because it is derived by covering a sphere with non-intersecting spherical caps, which is sufficient for almost orthogonality but not necessary. This is also why the lower bound of 2vectors makes sense when we require cosine similarity to be approximately 0, since then the only way you can fit two spherical caps onto the surface of a sphere is by dividing it into 2 hemispheres.
This doesn’t change the headline result (still exponentially much room for almost orthogonal vectors), but the actual numbers might be substantially larger thanks to almost orthogonal vectors being a weaker condition than spherical cap packing.
You made me curious, so I ran a small experiment. Using the sum of abs cos similarity as loss, initializing randomly on the unit sphere, and optimizing until convergence with LBGFS (with strong wolfe), here are the maximum cosine similarities I get (average and stds over 5 runs since there was a bit of variation between runs):
It seems consistent with the exponential trend, but it also looks like you would need dim>>1000 to have any significant boost of number of vectors you can fit with cosine similarity < 0.01, so I don’t think this happens in practice.
My optimization might have failed to converge to the global optimum though, this is not a nicely convex optimization problem (but the fact that there is little variation between runs is reassuring).
Unless I’m misunderstanding, a better lower bound for almost orthogonal vectors when cosine similarity is approximately 0 is just n, by taking an orthogonal basis for the space.
My guess for why the formula doesn’t give this is because it is derived by covering a sphere with non-intersecting spherical caps, which is sufficient for almost orthogonality but not necessary. This is also why the lower bound of 2vectors makes sense when we require cosine similarity to be approximately 0, since then the only way you can fit two spherical caps onto the surface of a sphere is by dividing it into 2 hemispheres.
This doesn’t change the headline result (still exponentially much room for almost orthogonal vectors), but the actual numbers might be substantially larger thanks to almost orthogonal vectors being a weaker condition than spherical cap packing.
You made me curious, so I ran a small experiment. Using the sum of abs cos similarity as loss, initializing randomly on the unit sphere, and optimizing until convergence with LBGFS (with strong wolfe), here are the maximum cosine similarities I get (average and stds over 5 runs since there was a bit of variation between runs):
It seems consistent with the exponential trend, but it also looks like you would need dim>>1000 to have any significant boost of number of vectors you can fit with cosine similarity < 0.01, so I don’t think this happens in practice.
My optimization might have failed to converge to the global optimum though, this is not a nicely convex optimization problem (but the fact that there is little variation between runs is reassuring).