cousin_it comments on True numbers and fake numbers

cousin_it 8 Feb 2014 0:04 UTC
0 points
I don’t know much about bioinformatics, so maybe this is a chance for me to learn something. What does it take to advance bioinformatics? Can you describe some examples?
- ChristianKl 8 Feb 2014 13:51 UTC
  0 points
  Parent
  On example of bioinformatics are CpG-island. They are basically parts of DNA with a lot of C and G and those parts don’t contain genes.
  
  At the beginning people tried to identify them with standards such as when X% of a Y base pair long strain are C and G and that strain is a CpG-island. People argued about what numbers for X and Y would provide for a more real way of identifying CpG-islands.
  
  Over time people decided against that approach. It better to have an expert identify a bunch of CpG-islands by hand by whatever standards he likes and then training a hidden-markov model to identify CpG-islands based on the trainings data.
  
  Part of the idea is that CpG-islands are not supposed to contain genes. Should a hidden-markov model identify some genes in CpG-islands one then tries to change the training data for the hidden-markov model.
  
  Over time that gives you a concept of CpG-islands that’s useful because you put in training data to make it useful. The hidden markov model might still identify some strains of DNA as CpG-island that don’t have the characteristics we expected CpG-island to have, but no model is perfect.
  
  As long as we can learn something useful from the model it doesn’t need to be perfect. There some distrust in bioinformatics against people who pretend that their model describes reality as is, because most models don’t work in every case.
  
  That also something to keep in mind when looking at projects such as the Blue Brain project. The goal isn’t to model a full human brain as it really is but to test a simplified model of the human brain. When everything goes well that model is good enough to learn something interesting about the human brain.
  
  To use the words of Alfred Korbyzski who wasn’t a bioinformatician, the map isn’t the territory. Good maps describes reality well enough that they are useful for navigating reality and making further discoveries.
  
  It might be equivalent to physicists who don’t focus on whether or not the Many World hypothesis is real but who focus on the math and whether equations provide good predictions via “shut up and calculate”.
  
  For shut up and calculate you need data. If you find a new way to efficiently gather reliable biological data then you can shut up and calculate instead of worrying whether your number are “real” or “hard” (whatever you mean with hard).