Since the complexity of many machine learning algorithms grows at least linearly (and in some cases quadratically or cubically) in the data, and the quantity of data itself will probably grow superlinearly, we do expect a robust increase in demand for computing.
Algorithms to find the parameters for a classifier/regression, or algorithms to make use of it? And if I’ve got a large dataset that I’m training a classifier/regression on, what’s to stop me from taking a relatively small sample of the data in order to train my model on? (The one time I used machine learning in a professional capacity, this is what I did. FYI I should not be considered an expert on machine learning.)
(On the other hand, if you’re training a classifier/regression for every datum, say every book on Amazon, and the number of books on Amazon is growing superlinearly, then yes I think you would get a robust increase.)
I’m not an expert in machine learning either, but here is what I meant.
If you’re running an algorithm such as linear or logistic regression, then there are two dimension numbers that are relevant: the number of data points, and the number of features (i.e., the number of parameters). For the design matrix of the regression, the number of data points is the number of rows and the number of features/parameters is the number of columns.
Holding the number of parameters constant, it’s true that if you increase the number of data points beyond a certain amount, you can get most of the value through subsampling. And even if not, more data points is not such a big issue.
But the main advantage of having more data is lost if you still use the same (small) number of features. Generally, when you have more data, you’d try to use that additional data to use a model with more features. The number of features would still be less than the number of data points. I’d say that in many cases it’s about 1% of the number of data points.
Of course, you could still use the model with the smaller number of features. In that case, you’re just not putting the new data to much good use. Which is fine, but not an effective use of the enlarged data set. (There may be cases where even with more data, adding more features is no use, because the model has already reached the limits of its predictive power).
For linear regression, the algorithm to solve it exactly (using normal equations) takes time that is cubic in the number of parameters (if you use the naive inverse). Although matrix inversion can in principle be done faster than cubic, it can’t be faster than quadratic, which is a general lower bound. Other iterative algorithms aren’t quite cubic, but they’re still more than linear.
That makes sense. And based on what I’ve seen, having more data to feed in to your model really is a pretty big asset when it comes to machine learning (I think I’ve seen this article referenced).
Algorithms to find the parameters for a classifier/regression, or algorithms to make use of it? And if I’ve got a large dataset that I’m training a classifier/regression on, what’s to stop me from taking a relatively small sample of the data in order to train my model on? (The one time I used machine learning in a professional capacity, this is what I did. FYI I should not be considered an expert on machine learning.)
(On the other hand, if you’re training a classifier/regression for every datum, say every book on Amazon, and the number of books on Amazon is growing superlinearly, then yes I think you would get a robust increase.)
Good question.
I’m not an expert in machine learning either, but here is what I meant.
If you’re running an algorithm such as linear or logistic regression, then there are two dimension numbers that are relevant: the number of data points, and the number of features (i.e., the number of parameters). For the design matrix of the regression, the number of data points is the number of rows and the number of features/parameters is the number of columns.
Holding the number of parameters constant, it’s true that if you increase the number of data points beyond a certain amount, you can get most of the value through subsampling. And even if not, more data points is not such a big issue.
But the main advantage of having more data is lost if you still use the same (small) number of features. Generally, when you have more data, you’d try to use that additional data to use a model with more features. The number of features would still be less than the number of data points. I’d say that in many cases it’s about 1% of the number of data points.
Of course, you could still use the model with the smaller number of features. In that case, you’re just not putting the new data to much good use. Which is fine, but not an effective use of the enlarged data set. (There may be cases where even with more data, adding more features is no use, because the model has already reached the limits of its predictive power).
For linear regression, the algorithm to solve it exactly (using normal equations) takes time that is cubic in the number of parameters (if you use the naive inverse). Although matrix inversion can in principle be done faster than cubic, it can’t be faster than quadratic, which is a general lower bound. Other iterative algorithms aren’t quite cubic, but they’re still more than linear.
That makes sense. And based on what I’ve seen, having more data to feed in to your model really is a pretty big asset when it comes to machine learning (I think I’ve seen this article referenced).