Richard_Kennaway comments on Open thread, Dec. 8 - Dec. 15, 2014

Richard_Kennaway 8 Dec 2014 12:13 UTC
16 points

These seem pretty easy to answer even for a non-expert.

It is variously said that we share 99% of our genes with a chimpanzee, 95% of our genes with a random human, and 50% of our genes with a sibling. Explain how these can all be true statements.
- ChristianKl 9 Dec 2014 14:29 UTC
  6 points
  Parent
  Without wanting to claim complete coverage of the subject, let me talk about a few relevant issues::
  Let’s look at what’s the word ‘gene’ supposed to mean in the first place.
  
  A while back there was the belief that DNA mainly exists to be translated into proteins. A gene was supposed to be a sequence that’s translated into a protein.
  
  Today we now that a lot of DNA exists to be translated into RNA without producing proteins. Depending on the circumstance you might count RNA producing DNA as genes or not.
  
  When you take a string of DNA that can produce a protein it’s possible that different splicing on introns produces a different protein. Humans seem to have something between 20000 and 25000 protein coding genes but >100,000 proteins. That drastic difference in numbers was a surprise to everyone when we did the human genome project.
  
  There seem to be multiple copies of some genes. It’s not clear whether you count them multiple times and you can’t count repetitions in DNA well because we sequence DNA via shotgun sequencing.
  
  If you compare the gene for human insulin with the one for champanzee insulin you can count it as both having insulin. You could use the match score between human insulin and champanzee insulin. You could say that it’s a different gene because it’s not exactly the same.
  
  In the last case you have to think about what “the same” mean. Is it enough that the same protein gets produced or do you also want the exact same DNA? There are 64 different 3 base pair combination and only 20 (+1) different amino acids, so some amino acids get encoded by multiple base pairs. Those changes could however change the amount of protein that get’s produced. When producing human insulin in the lab one for example switches those base pairs to maximize protein production.
  
  Lastly it’s not quite clear which DNA sequences actually get translated into proteins. One test is to try to let yeast or another organism produce the protein based on the gene and that’s expensive. It’s also possible that yeast simply lacks something to read that particular gene. In absence of that proof we have imperfect computer models that suggest to us which DNA sequences look like genes and which don’t.
  
  The official protein database Uniprot therefore has Tremble (uncurated data, with errors) and Swissprot (curated data, that’s supposed to be more trustworthy)
  
  That uncertainity is high enough that the official number of human protein-coding genes gets still quoted as 20000-25000.
  
  In addition to looking at the sequenced DNA you can also look at single-nucleotide polymorphisms. Those chips go for a selection of specific mutations and could also be used as a basis for number of how two organisms differ in their genes. At the moment the lastest 23andMe chip looks at 577,382 atDNA SNPs.
- tut 9 Dec 2014 10:38 UTC
  6 points
  Parent
  
  we share 99% of our genes with a chimpanzee
  
  99% of our genes have a chimp equivalent and vice versa
  
  95% of our genes with a random human
  
  In 95% of something or other of their genome two random humans have the exact same allele. In the other 5% they are no more similar than a human and a chimp are in the 99% that are shared between the species.
  
  and 50% of our genes with a sibling
  
  Of the loci where humanity have different alleles two whole siblings have identical alleles. This is a theoretical number for average siblings in a population with no inbreeding or any population structure, actual siblings tend to be much more similar.
- V_V 8 Dec 2014 16:44 UTC
  4 points
  Parent
  Non-expert there, but here are my two cents:
  
  we share 99% of our genes with a chimpanzee
  
  If you sequence your DNA and the DNA of a random chimp, and consider only the substrings that can be identified as genes, and measure string similarity between them, you will get a number between 98% and 99%, depending on the choice of string similarity measure (there are many reasonable choices).
  
  95% of our genes with a random human
  
  Never heard that before.
  
  50% of our genes with a sibling
  
  Suppose an unique id tag was attached to all the gene strings in the DNA of each of your parents. Even if the same gene appears in both of your parents, or even if it appears multiple times in the same parent, each instance gets a different id.
  Then your parents mate and produce you and your sibling. On average, you and your sibling will share 50% of these gene ids.
  Of course, many of these genes with different ids will be identical strings, hence the genetic similarity measured as in the human-chimp case will be > 99.9%.
- Unknowns 8 Dec 2014 13:33 UTC
  2 points
  Parent
  Saying this as a non-expert, the percentages are obviously taken over different gene pools (e.g. there is no reason to count genes in common with a chimpanzee when you are comparing two humans or two siblings.)
- gmzamz 8 Dec 2014 13:07 UTC
  1 point
  Parent
  This confuses me. I find it highly unlikely the average human shares more genes with a chimpanzee than another human and even more unlikely that siblings only share 50% of their genes.
  
  probability estimates (statement is true):
  - 99% genetic similarity to a chimpanzee = 75%
  - 95% genetic similarity to a random human = a low nonzero number
  - 50% genetic similarity to a sibling = 0%
  - 95% genetic similarity to a random human given 99% genetic similarity to a chimp = 0%
  I am going to research this.
  
  EDIT: findings:
  1. Researching an an actual number is exceeding difficult. About 50% of the pages are non-secular websites (this may be my non-optimized google searching). The rest are a mix between technical articles and articles formatted for the average human (average being living in a English speaking and developed nations).
  2. 99% genetic similarity to a chimpanzee
  Mostly correct. Estimates range between 95%^[1] and 98.8%^[2]
  - 95% genetic similarity to a random human
  Incorrect. Estimates are at 0.1%^[1]. I did not notice other numbers.
  - 50% genetic similarity to a sibling
  Incorrect as you stated it (comparing total gene dissimilarity). You might want to reword it since you were probably comparing what percentage of gene can be attributed to a parent.
  
  [1] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC129726/ [2] http://humanorigins.si.edu/evidence/genetics
  - Richard_Kennaway 8 Dec 2014 13:37 UTC
    4 points
    Parent
    
    This confuses me. I find it highly unlikely the average human shares more genes with a chimpanzee than another human and even more unlikely that siblings only share 50% of their genes.
    
    It puzzles me as well. I believe the answer is that there are multiple concepts of “shared genes”, but I have never been clear what they are.
- Dahlen 8 Dec 2014 17:08 UTC
  0 points
  Parent
  That depends on the meaning of “our”. A smaller and smaller subset of genes is being considered, as you shift focus from chimp to human to sibling. In the chimp example, the statistic may as well have been made for your entire genome, including stuff like genes coding for cell membrane (which doesn’t vary wildly with species/taxonomy, more likely to vary with tissue type—don’t know, not a biologist). In the sibling example, you take for granted that the greatest part of your genome is going to be shared by virtue of both of you being human, exclude those genes, and only count the rest.
  
  If you establish similarity/difference by counting the same set of genes (for instance all of them, like with chimps), the difference between you and your sibling might only differ by very, very few percentage points down from 100%, and that’s not exactly telling us anything useful, is it?
  
  At least this is how I understand it, and why that type of sentence doesn’t confuse me. Again, not a biologist, sorry for possible stupid mistakes/inaccuracies.
- Ilverin the Stupid and Offensive 8 Dec 2014 15:47 UTC
  0 points
  Parent
  Disclaimer: Not remotely an expert at biology, but I will try to explain.
  
  One can think of the word “gene” as having multiple related uses.
  
  Use 1: “Genotype”. Even if we have different color hair, we likely both have the same “gene” for hair which could be considered shared with chimpanzees. If you could re-write DNA nucleobases, you could change your hair color without changing the gene itself, you would merely be changing the “gene encoding”. The word “genotype” refers to a “function” which takes in a “gene encoding” and outputs a “gene phenotype”
  
  Use 2: “Gene phenotype”. If we both have the same color hair, we would have the same “Gene phenotype”. Suppose the genotype for hair is a gene that uses simple dominance. In this case, we could have the same phenotype even with different gene encodings. Suppose you have the gene encoding “BB” whereas I have the gene encoding “Bb”. In this case, we could both have black hair, the same “Gene phenotype”, but have different “Gene encodings”.
  
  Use 3: “Gene encoding”. If we have different color hair, then we have different gene encodings (but we have the same “genotype” as described in “Use 1”). This “gene encoding” is commonly not shared between siblings and less commonly shared between species.
  
  So “we share 99% of our genes with a chimpanzee” likely refers to “Genotype”.
  
  “95% of our genes with a random human” likely refers to “Gene phenotype”.
  
  “50% of our genes with a sibling” likely refers to “Gene encoding”.