Suppose I sample the genomes of two random humans, G1 and G2.What information is redundant across these two random variables?...So, for instance, I throw away Gi, then I look at all the other genomes and see that in most places they’re the same—so when I sample my new Gi, I know that it should match all the other genomes in all those places.
I can’t really tell what distribution(s) you’re talking about here. You describe G_1 and G_2 as two random humans; wouldn’t these then just be two draws from the same distribution of all human genomes? If not, what distributions are you talking about? Certainly, for a single human, there’s only one genome, not a distribution (presuming you’re not talking about chimerism or whatever). Are you trying to describe parameterized distribution of human genomes, where you have a prior over the parameter values and where you draw repeatedly from the genome distribution, updating your prior over the parameters?
I can’t really tell what distribution(s) you’re talking about here. You describe G_1 and G_2 as two random humans; wouldn’t these then just be two draws from the same distribution of all human genomes? If not, what distributions are you talking about? Certainly, for a single human, there’s only one genome, not a distribution (presuming you’re not talking about chimerism or whatever). Are you trying to describe parameterized distribution of human genomes, where you have a prior over the parameter values and where you draw repeatedly from the genome distribution, updating your prior over the parameters?