sketerpot comments on How to pick your categories

sketerpot 12 Nov 2010 22:36 UTC
4 points
The two aren’t mutually exclusive, of course. You can use specific knowledge about a particular problem to make your machine learning methods work better, sometimes.

I read a paper the other day about predicting the targets of a particular type of small nucleolar RNA, which are an important part of the machinery that regulates gene expression. One of the methods they used was to run an SVM classifier on a number of features about the RNA in question. SVM classifiers are one of those nice general-purpose easily-automated methods, but the authors used their knowledge of the specific problem to pick out what features it would use for its classification. Things like the length of particular parts of the RNA—stuff that would occur to molecular biologists, but could be prohibitively expensive for a purely automatic machine learning algorithm to discover if you just gave it all the relevant data.

(More bio nerdery: they combined this with a fast approximation of the electrostatic forces at work, and ended up getting remarkably good accuracy and speed. The paper is here, if anyone’s interested.)
- [deleted] 19 Nov 2010 23:18 UTC
  8 points
  Parent
  Belatedly, I remembered a relevant tidbit of wisdom I once got from a math professor.
  
  When a theorist comes up with a new algorithm, it’s not going to outperform existing algorithms used commercially in the “real world.” Not even if, in principle the new algorithm is more elegant or faster or whatever. Why? Because in the real world, you don’t just take a general-purpose algorithm off the page, you optimize the hell out of it. Engineers who work with airplanes will jimmy their algorithms to accommodate all the practical “common knowledge” about airplanes. A mere mathematician who doesn’t know anything about airplanes can’t compete with that.
  
  If you’re a theorist trying to come up with a better general method your goal is to give evidence that your algorithm will do better than the existing one after you optimize the hell out of them equally.