Posthoc hypothesising is only a problem when you’re using that hypothesis to analyse the same data that inspired it. Machine learning experts avoid this mistake by backtesting and foreward testing out of sample data.
Analysing unstructured data is useful for generating hypotheses, rather than for testing them to develop a model. Take computational epidemiology:
In contrast with traditional epidemiology, computational epidemiology looks for patterns in unstructured sources of data, such as social media. It can be thought of as the hypothesis-generating antecedent to hypothesis-testing methods such as national surveys and randomized controlled trials.
Posthoc hypothesising is only a problem when you’re using that hypothesis to analyse the same data that inspired it. Machine learning experts avoid this mistake by backtesting and foreward testing out of sample data.
Analysing unstructured data is useful for generating hypotheses, rather than for testing them to develop a model. Take computational epidemiology: