Whether something is easy to measure matters a lot whether it’s a good building block for future research.
If something is easy to measure and it’s a good predictor of other qualities it provides a good building block for further research.
Easy to measure means that you can do research and study how the variable interacts with other variables. That’s the core of research.
That means you care whether the measurement has random and systemic noise but you don’t have to ask for realness.
Do you know more degrees of freedom of a system through having a measurement is a better question than asking whether the measurement is real.
If you focus on a variable that seems more real for you but for which it’s hard to gather data it can’t serve as a good building block for future research because acquiring the data is expensive which makes the research expensive.
If you want to further research you want variables that are cheap to measure with low noise and which add degrees of freedom that you don’t already have from other variables that you can easily access.
In theory you might have 10 easy to measure data points and then run principle component anaylsis and find that you have 5 “real variables”. It doesn’t make sense to focus at the start on the 5 real variables. It makes much more sense to focus on easy to measure variables that add information.
You’re mostly talking about research in soft sciences, right?
Academically my background is bioinformatics. Depending on your view that might or might not be a soft science.
I also care a lot about QS and have thought a lot about measurement in that area.
I don’t have much knowledge of academic physics and don’t want to presume that I know what it takes to advance academic physics.
I don’t know much about bioinformatics, so maybe this is a chance for me to learn something. What does it take to advance bioinformatics? Can you describe some examples?
On example of bioinformatics are CpG-island. They are basically parts of DNA with a lot of C and G and those parts don’t contain genes.
At the beginning people tried to identify them with standards such as when X% of a Y base pair long strain are C and G and that strain is a CpG-island. People argued about what numbers for X and Y would provide for a more real way of identifying CpG-islands.
Over time people decided against that approach. It better to have an expert identify a bunch of CpG-islands by hand by whatever standards he likes and then training a hidden-markov model to identify CpG-islands based on the trainings data.
Part of the idea is that CpG-islands are not supposed to contain genes. Should a hidden-markov model identify some genes in CpG-islands one then tries to change the training data for the hidden-markov model.
Over time that gives you a concept of CpG-islands that’s useful because you put in training data to make it useful. The hidden markov model might still identify some strains of DNA as CpG-island that don’t have the characteristics we expected CpG-island to have, but no model is perfect.
As long as we can learn something useful from the model it doesn’t need to be perfect. There some distrust in bioinformatics against people who pretend that their model describes reality as is, because most models don’t work in every case.
That also something to keep in mind when looking at projects such as the Blue Brain project. The goal isn’t to model a full human brain as it really is but to test a simplified model of the human brain. When everything goes well that model is good enough to learn something interesting about the human brain.
To use the words of Alfred Korbyzski who wasn’t a bioinformatician, the map isn’t the territory. Good maps describes reality well enough that they are useful for navigating reality and making further discoveries.
It might be equivalent to physicists who don’t focus on whether or not the Many World hypothesis is real but who focus on the math and whether equations provide good predictions via “shut up and calculate”.
For shut up and calculate you need data. If you find a new way to efficiently gather reliable biological data then you can shut up and calculate instead of worrying whether your number are “real” or “hard” (whatever you mean with hard).
Whether something is easy to measure matters a lot whether it’s a good building block for future research. If something is easy to measure and it’s a good predictor of other qualities it provides a good building block for further research.
Easy to measure means that you can do research and study how the variable interacts with other variables. That’s the core of research.
That means you care whether the measurement has random and systemic noise but you don’t have to ask for realness.
Do you know more degrees of freedom of a system through having a measurement is a better question than asking whether the measurement is real.
If you focus on a variable that seems more real for you but for which it’s hard to gather data it can’t serve as a good building block for future research because acquiring the data is expensive which makes the research expensive.
If you want to further research you want variables that are cheap to measure with low noise and which add degrees of freedom that you don’t already have from other variables that you can easily access.
In theory you might have 10 easy to measure data points and then run principle component anaylsis and find that you have 5 “real variables”. It doesn’t make sense to focus at the start on the 5 real variables. It makes much more sense to focus on easy to measure variables that add information.
You’re mostly talking about research in soft sciences, right?
Academically my background is bioinformatics. Depending on your view that might or might not be a soft science. I also care a lot about QS and have thought a lot about measurement in that area.
I don’t have much knowledge of academic physics and don’t want to presume that I know what it takes to advance academic physics.
I don’t know much about bioinformatics, so maybe this is a chance for me to learn something. What does it take to advance bioinformatics? Can you describe some examples?
On example of bioinformatics are CpG-island. They are basically parts of DNA with a lot of C and G and those parts don’t contain genes.
At the beginning people tried to identify them with standards such as when X% of a Y base pair long strain are C and G and that strain is a CpG-island. People argued about what numbers for X and Y would provide for a more real way of identifying CpG-islands.
Over time people decided against that approach. It better to have an expert identify a bunch of CpG-islands by hand by whatever standards he likes and then training a hidden-markov model to identify CpG-islands based on the trainings data.
Part of the idea is that CpG-islands are not supposed to contain genes. Should a hidden-markov model identify some genes in CpG-islands one then tries to change the training data for the hidden-markov model.
Over time that gives you a concept of CpG-islands that’s useful because you put in training data to make it useful. The hidden markov model might still identify some strains of DNA as CpG-island that don’t have the characteristics we expected CpG-island to have, but no model is perfect.
As long as we can learn something useful from the model it doesn’t need to be perfect. There some distrust in bioinformatics against people who pretend that their model describes reality as is, because most models don’t work in every case.
That also something to keep in mind when looking at projects such as the Blue Brain project. The goal isn’t to model a full human brain as it really is but to test a simplified model of the human brain. When everything goes well that model is good enough to learn something interesting about the human brain.
To use the words of Alfred Korbyzski who wasn’t a bioinformatician, the map isn’t the territory. Good maps describes reality well enough that they are useful for navigating reality and making further discoveries.
It might be equivalent to physicists who don’t focus on whether or not the Many World hypothesis is real but who focus on the math and whether equations provide good predictions via “shut up and calculate”.
For shut up and calculate you need data. If you find a new way to efficiently gather reliable biological data then you can shut up and calculate instead of worrying whether your number are “real” or “hard” (whatever you mean with hard).