jacob_cannell comments on Self-fulfilling correlations

jacob_cannell 27 Aug 2010 7:03 UTC
7 points

Back in the late 1980s, neural networks were hot; and evaluations usually indicated that they outperformed other methods of classification. In the early 1990s, genetic algorithms were hot; and evaluations usually indicated that they outperformed other methods of classification. Today, support vector machines (SVMs) are hot; and evaluations usually indicate that they outperform other methods of classifications. Neural networks and genetic algorithms no longer outperform older methods. (I write this from memory, so you shouldn’t take it as gospel.)

I know you said you were writing from memory, but this paragraph above is very vague. ‘Neural networks’ in their many variants have been used (and still are used) in a wide variety of domains and applications, but mainly classification/recognition systems such as character recognition, speech recognition, machine vision in some cases, etc. Genetic algorithms (aka simulated annealing, evolutionary search) from what I recall are used in search and optimization applications, and I’m not aware of usage in recognition at all. Of course, using GA approaches to explore the space of NN designs is a powerful combination even today—there are numerous examples, but in the video game space, one of the most advanced and radical animation systems is driven by GA + NN approaches.

Thats one example, but in general GA-like evolutionary search is a technique that is unlikely to ever be ‘trumped’ in the domains it reigns. Its simple, old, but effective, like quicksort or radix sort in some sense for its domains. Think of it this way. Fully exploring a search space (for a circuit design or NN virtual circuit or whatever) with a GA/evolutionary search is expensive, but If you have the time to spare and enough population diversity, a big general evolutionary search will eventually converge to the optimal solution.

There’s many potential refinements on search, and many newer classification approaches, but many (such as SVM) are very similar to past approaches with a few minor but key improvements.

In a broader sense, all of technology always uses evolutionary search all the time, it is our current global meta-algorithm.

Fads can create self-fulfilling correlations. If neural networks are hot, the smartest people tend to work on neural networks. When you compare their results to other results, it can be difficult to look at neural networks vs., say, logistic regression; and factor out the smartest people vs. pretty smart people effect.

I’m skeptical about your conclusion for the publication bias, mainly because I find that even if it does exist, its swamped out by other effects.

Science and engineering involve something like a landgrab in ideaspace. Its really, really difficult in the modern era to find an idea that is actually new. But that is exactly what you need to make your career. Scientists aren’t like moths that are attracted to pretty flames, they are more like explorers and colonizers, who attempt to stake out novel claims. I believe this effect is quite strong, and this bias works in the exact opposite direction to that which you propose.

Researchers developing new techniques have a bias to over-represent their own results and under-represent older results.

All that being said, the two biases may both exist at different scales; like a gold rush effect drawing scientists into a domain combined with a strong local land-grab effect separating them into different regions of the space.
- Johnicholas 27 Aug 2010 19:49 UTC
  10 points
  Parent
  SIWOTI: Simulated annealing is not not not a genetic algorithm.
  
  I admit these are fuzzy, confusing categories, and you can certainly create borderline examples, but most implementations of simulated annealing use a cooling schedule, which has no natural analog in the genetic algorithm, and many implementations of genetic algorithms use crossover, which has no natural analog in simulated annealing.
  - Cyan 28 Aug 2010 1:46 UTC
    11 points
    Parent
    
    borderline examples
    
    I give you Differential evolution Monte Carlo! It slices! It dices! It’s a genetic algorithm and a simulated annealing algorithm! But wait -- there’s more! You also get an self-tuning Metropolis sampler! (The proof of the correctness of the sampler isn’t valid, but the theorem is true; the author gives a correct proof in a subsequent paper.)
  - jacob_cannell 27 Aug 2010 20:14 UTC
    0 points
    Parent
    Fair points. In my mind those techniques are all taxonomically related, close in conceptual space, but some of the boundaries are fuzzy. I’m not even sure what the general category is called, but ‘evolutionary search’ is what I roughly mean by “the space of algorithms that includes GA variants and simulated annealing, plus some other stuff”. Perhaps they aren’t that close, but thats where map stands and what I meant.
    
    But anyway, no GA algorithms vary the mutation or crossover parameters over time? That would be closer to SA then. I thought the similarity between SA and GA is they both explore a set of points in solution space simultaneously and thus are more generally robust vs single point exploratory techniques.
    - Johnicholas 28 Aug 2010 0:41 UTC
      2 points
      Parent
      Simulated annealing is a kind of single-point stochastic hill-climbing, with the temperature controlling how frequently a “downward” step is taken (that might bounce the single point out of a local optimum). It doesn’t explore a set of points simultaneously—I mean, there might be a way of describing it as such, but that would be an exotic, insightful analogy, rather than the usual bland, boring sort of description.
      
      Certainly some GA’s vary the mutation or crossover parameters—as I said before, you can certainly create borderline examples, but the (standard, introductory, boring) centers of the two fuzzy categories are pretty distinguishable.
      - jacob_cannell 28 Aug 2010 0:48 UTC
        0 points
        Parent
        Ahhh my bad. I thought that SA explored multiple solution points at once. Of course, if the stochastic jumps are similar, it could end up being the same exploration path eventually, just serial vs parallel, although that seems to make vanilla SA incredibly non-useful in the modern era of parallel computing.
        DanielVarga 28 Aug 2010 21:56 UTC
        3 points
        Parent
        
        [...] that seems to make vanilla SA incredibly non-useful in the modern era of parallel computing.
        
        That is what parallel tempering is for.
        [deleted] 14 Feb 2012 1:19 UTC
        0 points
        Parent
        Also, SA is specifically useful when the original objective function can be easily evaluated, but its derivatives are too expensive. With SA, you don’t need to compute derivatives or the normalizing constants. You can try quasi-Newton methods and other approaches, but even these are computationally intractable in many cases. There are certain ways in which a problem can be non-convex that makes SA an attractive alternative. In principle, this could be true even in low dimensional problems, meaning that it’s not at all just a question of parallelism. Another thing worth mentioning is that SA lends itself very well to the GPU in some cases where traditional optimizers don’t.
- [deleted] 14 Jun 2015 5:59 UTC
  0 points
  Parent
  
  Back in the late 1980s, neural networks were hot; and evaluations usually indicated that they outperformed other methods of classification. In the early 1990s, genetic algorithms were hot; and evaluations usually indicated that they outperformed other methods of classification. Today, support vector machines (SVMs) are hot; and evaluations usually indicate that they outperform other methods of classifications. Neural networks and genetic algorithms no longer outperform older methods. (I write this from memory, so you shouldn’t take it as gospel.)
  
  If this is true, the implications are beyond my current comprehension. Anyone know where I can find papers suggesting that such a phenomenon exists in statistical classification techniques?