Experimental predictability and generalizability are correlated
A criticism to having people attempt to predict the results of experiments is that this will be near impossible. The idea is that experiments are highly sensitive to parameters and these would need to be deeply understood in order for predictors to have a chance at being more accurate than an uninformed prior. For example, in a psychological survey, it would be important that the predictors knew the specific questions being asked, details about the population being sampled, many details about the experimenters, et cetera.
One counter-argument may not be to say that prediction will be easy in many cases, but rather that if these experiments cannot be predicted in a useful fashion without very substantial amounts of time, then these experiments aren’t probably going to be very useful anyway.
Good scientific experiments produce results are generalizable. For instance, a study on the effectiveness of Malaria on a population should give us useful information (probably for use with forecasting) about the effectiveness on Malaria on other populations. If it doesn’t, then value would be limited. It would really be more of a historic statement than a scientific finding.
Possible statement from a non-generalizable experiment:
“We found that intervention X was beneficial within statistical significance for a population of 2,000 people. That’s interesting if you’re interested in understanding the histories of these 2,000 people. However, we wouldn’t recommend inferring anything about this to other groups of people, or to understanding anything about these 2,000 people going forward.”
Formalization
One possible way of starting to formalize this a bit is to imagine experiments (assuming internal validity) as mathematical functions. The inputs would be the parameters and details of how the experiment was performed, and the results would be the main findings that the experiment found.
experimentn(inputs)=findings
If the experiment has internal validity, then observers should predict that if an identical (but subsequent) experiment were performed, it would result in identical findings.
p((experimentn+1(inputsi)=findingsi)|(experimentn(inputsi)=findingsi))=1
We could also say that if we took a probability distribution of the chances of every possible set of findings being true, the differential entropy of that distribution would be 0, as smart forecasters would recognize that findingsi is correct with ~100% probability.
H(experimentn+1(inputsi)|(experimentn(inputsi)=findingsi))≈0
Generalizability
Now, to be generalizable, then hopefully we could perturb the inputs in a minor way, but still have the entropy be low. Note that the important thing is not that the outputs not be changed, but rather that they remain predictable. For instance, a physical experiment that describes the basics of mechanical velocity may be performed on data with velocities of 50-100 miles/hour. This experiment would not be useful only if future experiments also described situations with similar velocities; but rather, if future experiments on velocity could be better predicted, no matter the specific velocities used.
We can describe a perturbation of inputsi to be inputsi+δ.
Thus, hopefully, the following will be true for low values of δ.
So, perhaps generalizability can be defined something like,
Generalizability is the ability for predictors to better predict the results of similar experiments upon seeing the results of a particular experiment, for increasingly wide definitions of “similar”.
Predictability and Generalizability
I could definitely imagine trying to formalize predictability better in this setting, or more specifically, formalize the concept of “do forecasters need to spend a lot of time understanding the parameters of an experiment.” In this case, that could look something like modeling how the amount of uncertainty forecasters have about the inputs correlates with their uncertainty about the outputs.
The general combination of predictability and generality would look something like adding an additional assumption:
If forecasters require a very high degree of information on the inputs to an experiment in order to predict it’s outputs, then it’s less likely they can predict (with high confidence) the results of future experiments with significant changes, once they see the results of said experiment.
Admitting, this isn’t using the definition of predictability that people are likely used to, but I imagine it correlates well enough.
Final Thoughts
I’ve been experimenting more with trying to formalize concepts like this. As such, I’d be quite curious to get any feedback from this work. I am a bit torn; on one hand I appreciate formality, but on the other this is decently messy and I’m sure it will turn off many readers.
We could also say that if we took a probability distribution of the chances of every possible set of findings being true, the differential entropy of that distribution would be 0, as smart forecasters would recognize that inputs_i s correct with ~100% probability.
In that paragraph, did you mean to say “findings_i is correct”?
***
Neat idea. I’m also not sure whether the idea is valuable because it could be implementable, or from “this is interesting because it gets us better models”.
In the first case, I’m not sure whether the correlation is strong enough to change any decisions. That is, I’m having trouble thinking of decisions for which I need to know the generalizability of something, and my best shot is measuring its predictability.
For example, in small foretold/metaculus communities, I’d imagine that miscellaneous factors like “is this question interesting enough to the top 10% of forecasters” will just make the path predictability → differential entropy → generalizability difficult to detect.
Experimental predictability and generalizability are correlated
A criticism to having people attempt to predict the results of experiments is that this will be near impossible. The idea is that experiments are highly sensitive to parameters and these would need to be deeply understood in order for predictors to have a chance at being more accurate than an uninformed prior. For example, in a psychological survey, it would be important that the predictors knew the specific questions being asked, details about the population being sampled, many details about the experimenters, et cetera.
One counter-argument may not be to say that prediction will be easy in many cases, but rather that if these experiments cannot be predicted in a useful fashion without very substantial amounts of time, then these experiments aren’t probably going to be very useful anyway.
Good scientific experiments produce results are generalizable. For instance, a study on the effectiveness of Malaria on a population should give us useful information (probably for use with forecasting) about the effectiveness on Malaria on other populations. If it doesn’t, then value would be limited. It would really be more of a historic statement than a scientific finding.
Possible statement from a non-generalizable experiment:
Formalization
One possible way of starting to formalize this a bit is to imagine experiments (assuming internal validity) as mathematical functions. The inputs would be the parameters and details of how the experiment was performed, and the results would be the main findings that the experiment found.
experimentn(inputs)=findings
If the experiment has internal validity, then observers should predict that if an identical (but subsequent) experiment were performed, it would result in identical findings. p((experimentn+1(inputsi)=findingsi)|(experimentn(inputsi)=findingsi))=1
We could also say that if we took a probability distribution of the chances of every possible set of findings being true, the differential entropy of that distribution would be 0, as smart forecasters would recognize that findingsi is correct with ~100% probability. H(experimentn+1(inputsi)|(experimentn(inputsi)=findingsi))≈0
Generalizability
Now, to be generalizable, then hopefully we could perturb the inputs in a minor way, but still have the entropy be low. Note that the important thing is not that the outputs not be changed, but rather that they remain predictable. For instance, a physical experiment that describes the basics of mechanical velocity may be performed on data with velocities of 50-100 miles/hour. This experiment would not be useful only if future experiments also described situations with similar velocities; but rather, if future experiments on velocity could be better predicted, no matter the specific velocities used.
We can describe a perturbation of inputsi to be inputsi+δ.
Thus, hopefully, the following will be true for low values of δ.
H((experimentn+1(inputsi+δ)|(experimentn(inputsi)=findingsi))≈0
So, perhaps generalizability can be defined something like,
Predictability and Generalizability
I could definitely imagine trying to formalize predictability better in this setting, or more specifically, formalize the concept of “do forecasters need to spend a lot of time understanding the parameters of an experiment.” In this case, that could look something like modeling how the amount of uncertainty forecasters have about the inputs correlates with their uncertainty about the outputs.
The general combination of predictability and generality would look something like adding an additional assumption:
Admitting, this isn’t using the definition of predictability that people are likely used to, but I imagine it correlates well enough.
Final Thoughts
I’ve been experimenting more with trying to formalize concepts like this. As such, I’d be quite curious to get any feedback from this work. I am a bit torn; on one hand I appreciate formality, but on the other this is decently messy and I’m sure it will turn off many readers.
In that paragraph, did you mean to say “findings_i is correct”?
***
Neat idea. I’m also not sure whether the idea is valuable because it could be implementable, or from “this is interesting because it gets us better models”.
In the first case, I’m not sure whether the correlation is strong enough to change any decisions. That is, I’m having trouble thinking of decisions for which I need to know the generalizability of something, and my best shot is measuring its predictability.
For example, in small foretold/metaculus communities, I’d imagine that miscellaneous factors like “is this question interesting enough to the top 10% of forecasters” will just make the path predictability → differential entropy → generalizability difficult to detect.
The main point I was getting at is that the phrases:
Experiments are important to perform.
Predictors cannot decently predict the results of experiments unless they have gigantic amounts of time.
Are a bit contradictory. You can choose either, but probably not both.
Likewise, I’d expect that experiments that are easier to predict are ones that are more useful, which is more convenient than the other alternative.
I think generally we will want to estimate importance/generality of experiments separate from their predictability.