Note: I realize that the math in the models above, specifically in the combinations of estimates, is incorrect. I’m currently investigating how to do it correctly.
I was curious about the combinations of estimates problem, so here is what I came up with.
Of course this will depend on some assumptions about how the estimates should be interpreted. For example, in the unlikely case that the model says “the true value is 100% in this range”, you simply take the overlap of the ranges: 100±10 and 110±5 together give you [105..110] or 107.5±2.5.
Much more interesting is the assumption that the ± value is a multiple of the standard deviation. (This covers the Gaussian interval case, but is more general; they could be using some other distribution parametrized by mean and variance, or simply appealing to Chebyshev’s inequality.)
Let us say that 100±30 and 115±5 are such estimates, generated via sample means. Thus, an average of n samples gave us a mean of 100 and a variance of 900. (Thus, the total of the n samples had variance 900n^2, and each sample had variance 900n.) To get a variance of 25 by sampling from the same distribution, we would need 36n samples.
Now we can combine the averages. All 37n samples together add up to 100(n)+110(36n)=4060n, which yields a mean of roughly 114.6. Each sample still has variance 900n, so the total variance is (900n)x(37n)=33300n^2. Dividing by (37n)^2, we get a variance of 24.3, or a standard deviation of 4.93; thus, the correct combined estimate is 114.6±4.93.
It doesn’t matter if ±30 and ±5 are, say, 2 or 3 standard deviations, as long as they’re the same multiple.
I was curious about the combinations of estimates problem, so here is what I came up with.
Of course this will depend on some assumptions about how the estimates should be interpreted. For example, in the unlikely case that the model says “the true value is 100% in this range”, you simply take the overlap of the ranges: 100±10 and 110±5 together give you [105..110] or 107.5±2.5.
Much more interesting is the assumption that the ± value is a multiple of the standard deviation. (This covers the Gaussian interval case, but is more general; they could be using some other distribution parametrized by mean and variance, or simply appealing to Chebyshev’s inequality.)
Let us say that 100±30 and 115±5 are such estimates, generated via sample means. Thus, an average of n samples gave us a mean of 100 and a variance of 900. (Thus, the total of the n samples had variance 900n^2, and each sample had variance 900n.) To get a variance of 25 by sampling from the same distribution, we would need 36n samples.
Now we can combine the averages. All 37n samples together add up to 100(n)+110(36n)=4060n, which yields a mean of roughly 114.6. Each sample still has variance 900n, so the total variance is (900n)x(37n)=33300n^2. Dividing by (37n)^2, we get a variance of 24.3, or a standard deviation of 4.93; thus, the correct combined estimate is 114.6±4.93.
It doesn’t matter if ±30 and ±5 are, say, 2 or 3 standard deviations, as long as they’re the same multiple.