So there’s two potential sources of error in estimating I(A;B) from sample data:
The sample I(A;B) is a biased estimator of the true value of I(A;B), and will see slight patterns when there are none. (See this blog post, for example, for more information.)
Plus, of course, the sample will deviate slightly even from its expected value, so some tests will get “luckier” values than others.
Experimentally (I did a simulation), both of these have an effect on the order of 1/N, where N is the number of trials. So if you were comparing a relatively small number of tests, you should run enough iterations that 1/N is insignificant relative to whatever values of mutual information you end up obtaining. (These will be between 0 and 1, but may vary depending on how good your tests are.)
If you have a large number of tests to compare, you run into a third issue:
Although for the typical test, the error is on the order of 1/N, the error for the most misestimated test may be much larger; if that error exceeds the typical value of mutual information, the tests ranked most useful will merely be the ones most misestimated.
Not knowing how errors in mutual information estimates tend to be distributed, I would reason from Chebyshev’s inequality, which makes no assumptions about this. It suggests that the error should be multiplied by sqrt(T), where T is the number of tests, giving us an error on the order of sqrt(T)/N. So make N large enough that this is small.
Independently of the above, I suggest making up a toy model of your problem, in which you know the true value of all the tests and can run a simulation with a number of iterations that would be prohibitive in the real world. This will give you an idea of what to expect.
Oh, thank you. This was immensely useful. I now will pick some other object of study, and limit myself to a few tests (about 8). I kinda suspected I’ll have to obtain data for as many populations as possible, to estimate between-population variation, and for as many trial specimens as possible, but I didn’t know exactly how to check it for efficiency. Happy winter holidays to you!
So there’s two potential sources of error in estimating I(A;B) from sample data:
The sample I(A;B) is a biased estimator of the true value of I(A;B), and will see slight patterns when there are none. (See this blog post, for example, for more information.)
Plus, of course, the sample will deviate slightly even from its expected value, so some tests will get “luckier” values than others.
Experimentally (I did a simulation), both of these have an effect on the order of 1/N, where N is the number of trials. So if you were comparing a relatively small number of tests, you should run enough iterations that 1/N is insignificant relative to whatever values of mutual information you end up obtaining. (These will be between 0 and 1, but may vary depending on how good your tests are.)
If you have a large number of tests to compare, you run into a third issue:
Although for the typical test, the error is on the order of 1/N, the error for the most misestimated test may be much larger; if that error exceeds the typical value of mutual information, the tests ranked most useful will merely be the ones most misestimated.
Not knowing how errors in mutual information estimates tend to be distributed, I would reason from Chebyshev’s inequality, which makes no assumptions about this. It suggests that the error should be multiplied by sqrt(T), where T is the number of tests, giving us an error on the order of sqrt(T)/N. So make N large enough that this is small.
Independently of the above, I suggest making up a toy model of your problem, in which you know the true value of all the tests and can run a simulation with a number of iterations that would be prohibitive in the real world. This will give you an idea of what to expect.
Oh, thank you. This was immensely useful. I now will pick some other object of study, and limit myself to a few tests (about 8). I kinda suspected I’ll have to obtain data for as many populations as possible, to estimate between-population variation, and for as many trial specimens as possible, but I didn’t know exactly how to check it for efficiency. Happy winter holidays to you!