WrongBot comments on Open Thread: July 2010

WrongBot 2 Jul 2010 15:45 UTC
3 points
The scientific method is already a vague sort of algorithm, and I can see how it might be possible to mechanize many of the steps. The part that seems AGI-hard to me is the process of generating good hypotheses. Humans are incredibly good at plucking out reasonable hypotheses from the infinite search space that is available; that we are so very often says more of the difficulty of the problem than our own abilities.
- NancyLebovitz 2 Jul 2010 16:27 UTC
  2 points
  Parent
  I’m pretty sure that judging whether one has adequately tested a hypothesis is also going to be very hard to mechanize.
  - SilasBarta 2 Jul 2010 16:39 UTC
    3 points
    Parent
    The problem that I hear most often in regard to mechanizing this process has the basic form, “Obviously, you need a human in the loop because of all the cases where you need to be able to recognize that a correlation is spurious, and thus to ignore it, and that comes from having good background knowledge.”
    
    But you have to wonder: the human didn’t learn how to recognize spurious correlations through magic. So however they came up with that capability should be some identifiable process.
    - cupholder 3 Jul 2010 4:41 UTC
      4 points
      Parent
      
      The problem that I hear most often in regard to mechanizing this process has the basic form, “Obviously, you need a human in the loop because of all the cases where you need to be able to recognize that a correlation is spurious, and thus to ignore it, and that comes from having good background knowledge.”
      
      Those people should be glad they’ve never heard of TETRAD—their heads might have exploded!
      - NancyLebovitz 3 Jul 2010 10:01 UTC
        2 points
        Parent
        That’s intriguing. Has it turned out to be useful?
        cupholder 4 Jul 2010 5:31 UTC
        6 points
        Parent
        It’s apparently been put to use with some success. Clark Glymour—a philosophy professor who helped develop TETRAD—wrote a long review of The Bell Curve that lists applications of an earlier version of TETRAD (see section 6 of the review):
        
        Several other applications have been made of the techniques, for example:
        
        Spirtes et al. (1993) used published data on a small observational sample of Spartina grass from the Cape Fear estuary to correctly predict—contrary both to regression results and expert opinion—the outcome of an unpublished greenhouse experiment on the influence of salinity, pH and aeration on growth.
        
        Druzdzel and Glymour (1994) used data from the US News and World Report survey of American colleges and universities to predict the effect on dropout rates of manipulating average SAT scores of freshman classes. The prediction was confirmed at Carnegie Mellon University.
        
        Waldemark used the techniques to recalibrate a mass spectrometer aboard a Swedish satellite, reducing errors by half.
        
        Shipley (1995, 1997, in review) used the techniques to model a variety of biological problems, and developed adaptations of them for small sample problems.
        
        Akleman et al. (1997) have found that the graphical model search techniques do as well or better than standard time series regression techniques based on statistical loss functions at out of sample predictions for data on exchange rates and corn prices.
        
        Personally I find it a little odd that such a useful tool is still so obscure, but I guess a lot of scientists are loath to change tools and techniques.
    - NancyLebovitz 2 Jul 2010 17:12 UTC
      0 points
      Parent
      Maybe it’s just a matter of people kidding themselves about how hard it is to explain something.
      
      On the other hand, some things (like vision and natural language) are genuinely hard to figure out.
      
      I’m not saying the problem is insoluble. I’m saying it looks very difficult.
- cupholder 3 Jul 2010 5:08 UTC
  0 points
  Parent
  One possible way to get started is to do what the ‘Distilling Free-Form Natural Laws from Experimental Data’ project did: feed measurements of time and other variables of interest into a computer program which uses a genetic algorithm to build functions that best represent one variable as a function of itself and the other variables. The Science article is paywalled but available elsewhere. (See also this bunch of presentation slides.)
  
  They also have software for you to do this at home.