There is a small set of operations (dimension reduction, 2-class categorization, n-class categorization, prediction) and algorithms for them (PCA, SVM, k-means, regression) that work well on a wide variety of domains. Does that help?
Not that wide a variety of domains, compared to all human tasks.
Specifically, they can only handle data that comes in matrix form, and often only after it has been cleaned up and processed by a human being. Consider, just the iris dataset: if instead of the measurements of the flowers you were working with photographs of the flowers, you might have made your problem substantially harder, since now you have a vision task not amenable to the algorithms you list.
Can you give an example of data that doesn’t come in matrix form? If you have a set of neurons and a set of connections between them, that’s a matrix. If you have asynchronous signals travelling between those neurons, that’s a time series of matrices. If it ain’t in a matrix, it ain’t data.
[ADDED: This was a silly thing for me to say, but most big data problems use matrices.]
The answer you just wrote could be characterized as a matrix of vocabulary words and index-of-occurrence. But that’s a pretty poor way to characterize it for almost all natural language processing techniques.
First of all, something like PCA and the other methods you listed won’t work on a ton of things that could be shoehorned into matrix format.
Taking an image or piece of audio and representing it using raw pixel or waveform data is horrible for most machine learning algorithms. Instead, you want to heavily transform it before you consider putting it into something like PCA.
A different problem goes for the matrix of neuronal connections in the brain: it’s too large-scale, too sparse, and too heterogenous to be usefully analyzed by anything but specialized methods with a lot of preprocessing and domain knowledge going into them. You might be able to cluster different functional units of the brain, but as you tried to get to more granular units, heterogeneity in number of connnections per neuron would cause dense clusters to “absorb” sparser but legitimate clusters in almost all clustering methods. Working with a time-series of activations is an even bigger problem, since you want to isolate specific cascades of activations that correspond to a stimulus, and then look at the architecture of the activated part of the brain, characterize it, and then be able to understand things like which neurons are functionally equivalent but correspond to different parallel units in the brain (left eye vs. right eye).
If I give you a time series of neuronal activations and connections with no indication of the domain, you’d probably be able to come up with a somewhat predictive model using non-domain-specific methods, but you’d be handicapping yourself horribly.
Inferring causality is another problem—none of these predictive machine learning methods do a good job of establishing whether two factors have a causal relation, merely whether they have a predictive one (within the particular given dataset).
First, yes, I overgeneralized. Matrices don’t represent natural language and logic well.
But, the kinds of problems you’re talking about—music analysis, picture analysis, and anything you eventually want to put into PCA—are perfect for matrix methods. It’s popular to start music and picture analysis with a discrete Fourier transform, which is a matrix operation. Or you use MPEG, which is all matrices. Or you construct feature detectors, say edge detectors or contrast detectors, using simple neural networks such as those found in primary visual cortex, and you implement them with matrices. Then you pass those into higher-order feature detectors, which also use matrices. You may break information out of the matrices and process it logically further downstream, but that will be downstream of PCA. As a general rule, PCA is used only on data that has so far existed only in matrices. Things that need to be broken out are not homogenous enough, or too structured, to use PCA on.
There’s an excellent book called Neural Engineering by Chris Eliasmith in which he develops a matrix-based programming language that is supposed to perform calculations the way that the brain does. It has many examples of how to tackle “intelligent” problems with only matrices.
There is a small set of operations (dimension reduction, 2-class categorization, n-class categorization, prediction) and algorithms for them (PCA, SVM, k-means, regression) that work well on a wide variety of domains. Does that help?
Not that wide a variety of domains, compared to all human tasks.
Specifically, they can only handle data that comes in matrix form, and often only after it has been cleaned up and processed by a human being. Consider, just the iris dataset: if instead of the measurements of the flowers you were working with photographs of the flowers, you might have made your problem substantially harder, since now you have a vision task not amenable to the algorithms you list.
Can you give an example of data that doesn’t come in matrix form? If you have a set of neurons and a set of connections between them, that’s a matrix. If you have asynchronous signals travelling between those neurons, that’s a time series of matrices. If it ain’t in a matrix, it ain’t data.
[ADDED: This was a silly thing for me to say, but most big data problems use matrices.]
The answer you just wrote could be characterized as a matrix of vocabulary words and index-of-occurrence. But that’s a pretty poor way to characterize it for almost all natural language processing techniques.
First of all, something like PCA and the other methods you listed won’t work on a ton of things that could be shoehorned into matrix format.
Taking an image or piece of audio and representing it using raw pixel or waveform data is horrible for most machine learning algorithms. Instead, you want to heavily transform it before you consider putting it into something like PCA.
A different problem goes for the matrix of neuronal connections in the brain: it’s too large-scale, too sparse, and too heterogenous to be usefully analyzed by anything but specialized methods with a lot of preprocessing and domain knowledge going into them. You might be able to cluster different functional units of the brain, but as you tried to get to more granular units, heterogeneity in number of connnections per neuron would cause dense clusters to “absorb” sparser but legitimate clusters in almost all clustering methods. Working with a time-series of activations is an even bigger problem, since you want to isolate specific cascades of activations that correspond to a stimulus, and then look at the architecture of the activated part of the brain, characterize it, and then be able to understand things like which neurons are functionally equivalent but correspond to different parallel units in the brain (left eye vs. right eye).
If I give you a time series of neuronal activations and connections with no indication of the domain, you’d probably be able to come up with a somewhat predictive model using non-domain-specific methods, but you’d be handicapping yourself horribly.
Inferring causality is another problem—none of these predictive machine learning methods do a good job of establishing whether two factors have a causal relation, merely whether they have a predictive one (within the particular given dataset).
First, yes, I overgeneralized. Matrices don’t represent natural language and logic well.
But, the kinds of problems you’re talking about—music analysis, picture analysis, and anything you eventually want to put into PCA—are perfect for matrix methods. It’s popular to start music and picture analysis with a discrete Fourier transform, which is a matrix operation. Or you use MPEG, which is all matrices. Or you construct feature detectors, say edge detectors or contrast detectors, using simple neural networks such as those found in primary visual cortex, and you implement them with matrices. Then you pass those into higher-order feature detectors, which also use matrices. You may break information out of the matrices and process it logically further downstream, but that will be downstream of PCA. As a general rule, PCA is used only on data that has so far existed only in matrices. Things that need to be broken out are not homogenous enough, or too structured, to use PCA on.
There’s an excellent book called Neural Engineering by Chris Eliasmith in which he develops a matrix-based programming language that is supposed to perform calculations the way that the brain does. It has many examples of how to tackle “intelligent” problems with only matrices.