ChristianKl comments on True numbers and fake numbers

ChristianKl 7 Feb 2014 17:55 UTC
0 points
I don’t think that dividing reality into true numbers and fake numbers is very useful.

I think it’s more useful to ask yourself whether you can measure something for reasonable effort that’s a good predictor of some outcome you care about. You should also ask whether it continues to be a good predictor if you optimise towards it.

Discussing whether IQ is the real measure of intelligence is irrelevant. The important question is whether it predicts performance for tasks you care about.

Apart seeking for quantities that are good predictors it also makes sense to seek for quantities that are easy to measure. If you have a way to gather a lot of data cheaply and you can afterwards analyse what you can predict with it. That’s one of the reasons I always try to talk someone into doing the work of turning Anki review data in daily cognitive measurement scores. I don’t care whether the cognitive measurement from the Anki data is more of less real than IQ.

If I get it calculated for every day without having to spend additional time, that’s very valuable. IQ tests are relatively expensive because they take time and if you take the same test multiple time you train it in a way that make sit useless. We can also analyse whether it provides a good predictors for other outcomes we care about.
- cousin_it 7 Feb 2014 21:44 UTC
  0 points
  Parent
  I think you’re coming from a different perspective. You care whether a quantity is easy to measure and whether it’s a good predictor of something that’s useful in practice. I don’t care very much about that, because in pre-paradigmatic fields it’s too early to ask for practical applications anyway. Instead I care whether the quantity can serve as a good building block for future research, and for that it needs to be a hard number rather than a soft one, so to speak. (Maybe I should’ve called them hard vs soft, instead of true vs fake.)
  - ChristianKl 7 Feb 2014 22:12 UTC
    0 points
    Parent
    Whether something is easy to measure matters a lot whether it’s a good building block for future research. If something is easy to measure and it’s a good predictor of other qualities it provides a good building block for further research.
    
    Easy to measure means that you can do research and study how the variable interacts with other variables. That’s the core of research.
    
    That means you care whether the measurement has random and systemic noise but you don’t have to ask for realness.
    
    Do you know more degrees of freedom of a system through having a measurement is a better question than asking whether the measurement is real.
    
    If you focus on a variable that seems more real for you but for which it’s hard to gather data it can’t serve as a good building block for future research because acquiring the data is expensive which makes the research expensive.
    
    If you want to further research you want variables that are cheap to measure with low noise and which add degrees of freedom that you don’t already have from other variables that you can easily access.
    
    In theory you might have 10 easy to measure data points and then run principle component anaylsis and find that you have 5 “real variables”. It doesn’t make sense to focus at the start on the 5 real variables. It makes much more sense to focus on easy to measure variables that add information.
    - cousin_it 7 Feb 2014 22:31 UTC
      0 points
      Parent
      You’re mostly talking about research in soft sciences, right?
      - ChristianKl 7 Feb 2014 22:44 UTC
        0 points
        Parent
        
        You’re mostly talking about research in soft sciences, right?
        
        Academically my background is bioinformatics. Depending on your view that might or might not be a soft science. I also care a lot about QS and have thought a lot about measurement in that area.
        
        I don’t have much knowledge of academic physics and don’t want to presume that I know what it takes to advance academic physics.
        cousin_it 8 Feb 2014 0:04 UTC
        0 points
        Parent
        I don’t know much about bioinformatics, so maybe this is a chance for me to learn something. What does it take to advance bioinformatics? Can you describe some examples?
        ChristianKl 8 Feb 2014 13:51 UTC
        0 points
        Parent
        On example of bioinformatics are CpG-island. They are basically parts of DNA with a lot of C and G and those parts don’t contain genes.
        
        At the beginning people tried to identify them with standards such as when X% of a Y base pair long strain are C and G and that strain is a CpG-island. People argued about what numbers for X and Y would provide for a more real way of identifying CpG-islands.
        
        Over time people decided against that approach. It better to have an expert identify a bunch of CpG-islands by hand by whatever standards he likes and then training a hidden-markov model to identify CpG-islands based on the trainings data.
        
        Part of the idea is that CpG-islands are not supposed to contain genes. Should a hidden-markov model identify some genes in CpG-islands one then tries to change the training data for the hidden-markov model.
        
        Over time that gives you a concept of CpG-islands that’s useful because you put in training data to make it useful. The hidden markov model might still identify some strains of DNA as CpG-island that don’t have the characteristics we expected CpG-island to have, but no model is perfect.
        
        As long as we can learn something useful from the model it doesn’t need to be perfect. There some distrust in bioinformatics against people who pretend that their model describes reality as is, because most models don’t work in every case.
        
        That also something to keep in mind when looking at projects such as the Blue Brain project. The goal isn’t to model a full human brain as it really is but to test a simplified model of the human brain. When everything goes well that model is good enough to learn something interesting about the human brain.
        
        To use the words of Alfred Korbyzski who wasn’t a bioinformatician, the map isn’t the territory. Good maps describes reality well enough that they are useful for navigating reality and making further discoveries.
        
        It might be equivalent to physicists who don’t focus on whether or not the Many World hypothesis is real but who focus on the math and whether equations provide good predictions via “shut up and calculate”.
        
        For shut up and calculate you need data. If you find a new way to efficiently gather reliable biological data then you can shut up and calculate instead of worrying whether your number are “real” or “hard” (whatever you mean with hard).