Develop general algorithms for assessing similarity, apply them to a dataset where the actual historical relationships are known, and then see if they generalize to the rest of the verified datasets. If so, adjusting for multiple hypothesis testing, use those algorithms on the uncertain cases. The key thing is to evaluate the algorithms based on their performance if applied universally, not just in one cherry-picked case.
Develop general algorithms for assessing similarity, apply them to a dataset where the actual historical relationships are known, and then see if they generalize to the rest of the verified datasets. If so, adjusting for multiple hypothesis testing, use those algorithms on the uncertain cases. The key thing is to evaluate the algorithms based on their performance if applied universally, not just in one cherry-picked case.