No, “allele” is not the word we want, though we should be using it in preference to “gene”. “Allele” just means “a particular variant of a gene”. Technically speaking, “gene” means all the ways of coding for some particular set of structures (or rather the proteins that end up constructing them, or otherwise affect development). For example, humans have two primary genes for blood type. The first gene determines the Rh factor, with one allele of that gene coding for positive Rh, and the other for negative Rh. The second gene determines the ABO encoding, with one allele coding for O, a second for A, and a third for B. And of course, the alleles on each copy of the gene combine to produce different phenotypes, which can often be simplified to the “dominant recessive” model when there are only two common alleles in a population (e.g. Rh). ABO typing is more complicated—A and B refer to types of “antigens” (surface markers) that your blood cells may have. Each is produced if you have at least one allele of that type. O, in contrast produces no antigens. (There are actually a whole passel of other genes that code for existence of a whole lot of other antigens and typing factors, but the variants are a lot rarer, so most people don’t need to worry about them.)
The term wedifrid is asking for (and that I would really like to have) is about the frequencies of alleles. There is casual talk of someone’s son being 50% related to his father. Certainly exactly 50% of his alleles were copied from his father. On the other hand, we should say that he’s also 50% related to his father’s identical twin brother, where there is no direct copying—just the happenstance that this set of 50% of alleles is identical to that of his father’s identical twin. But, as it turns out, of the 50% of genes that weren’t copied, a very very high proportion will be the same as in his father (or indeed his uncle).
A natural distance to define on these sets of alleles is the l_1 distance “how many are different”. (we can choose this on the level of DNA letters, codons, codons with equivalents lumped together, or “expresses same protein”; and measure slightly different things). For most genes there is effectively only one allele, so this measure of similarity doesn’t go from 0% to 100%. Instead, it varies in a tiny fraction of 1% around 100%. If we do want it to be able to drop down to 0%, then we shouldn’t count those genes with only one allele. How about genes with two alleles, but one is extremely rare? Perhaps entropy of each gene would be a reasonable weighting. Just counting “is there a path of descent to a common ancestor” certainly won’t let us stretch the scale out to 0% either, because in that sense we’re all heavily related. We want some (hopefully mathematically formalizable) sense of “this guy’s deviance from the human average overlaps by x% with this other guy’s deviance from the human average”. Or at least a more precise word for that, rather than just talking about shared gene (allele) percentages, which will be extremely close to 100%.
Allele?
No, “allele” is not the word we want, though we should be using it in preference to “gene”. “Allele” just means “a particular variant of a gene”. Technically speaking, “gene” means all the ways of coding for some particular set of structures (or rather the proteins that end up constructing them, or otherwise affect development). For example, humans have two primary genes for blood type. The first gene determines the Rh factor, with one allele of that gene coding for positive Rh, and the other for negative Rh. The second gene determines the ABO encoding, with one allele coding for O, a second for A, and a third for B. And of course, the alleles on each copy of the gene combine to produce different phenotypes, which can often be simplified to the “dominant recessive” model when there are only two common alleles in a population (e.g. Rh). ABO typing is more complicated—A and B refer to types of “antigens” (surface markers) that your blood cells may have. Each is produced if you have at least one allele of that type. O, in contrast produces no antigens. (There are actually a whole passel of other genes that code for existence of a whole lot of other antigens and typing factors, but the variants are a lot rarer, so most people don’t need to worry about them.)
The term wedifrid is asking for (and that I would really like to have) is about the frequencies of alleles. There is casual talk of someone’s son being 50% related to his father. Certainly exactly 50% of his alleles were copied from his father. On the other hand, we should say that he’s also 50% related to his father’s identical twin brother, where there is no direct copying—just the happenstance that this set of 50% of alleles is identical to that of his father’s identical twin. But, as it turns out, of the 50% of genes that weren’t copied, a very very high proportion will be the same as in his father (or indeed his uncle).
A natural distance to define on these sets of alleles is the l_1 distance “how many are different”. (we can choose this on the level of DNA letters, codons, codons with equivalents lumped together, or “expresses same protein”; and measure slightly different things). For most genes there is effectively only one allele, so this measure of similarity doesn’t go from 0% to 100%. Instead, it varies in a tiny fraction of 1% around 100%. If we do want it to be able to drop down to 0%, then we shouldn’t count those genes with only one allele. How about genes with two alleles, but one is extremely rare? Perhaps entropy of each gene would be a reasonable weighting. Just counting “is there a path of descent to a common ancestor” certainly won’t let us stretch the scale out to 0% either, because in that sense we’re all heavily related. We want some (hopefully mathematically formalizable) sense of “this guy’s deviance from the human average overlaps by x% with this other guy’s deviance from the human average”. Or at least a more precise word for that, rather than just talking about shared gene (allele) percentages, which will be extremely close to 100%.