Statistics is an applied science, similar to engineering. It has to deal with the messy world where you might need to draw conclusions from a small data set of uncertain provenance where some outliers might be data entry mistakes (or maybe not), you are uncertain of the shape of the distributions you are dealing with, have a sneaking suspicion that the underlying process is not stable in time, etc. etc. None of the nice assumptions underlying nice proofs of optimality apply. You still need to analyse this data set.
Except it’s not math. Disciplines are socially constructed, statistics is what statisticians do. Applied math is what applied math people do. There are lots of very theoretical stats departments. I think you are having a similar confusion people have sometimes about computer science and programming.
I think if you say stuff like “well, all those people who publish in Annals of Statistics are applied math people” I am not sure what you are really saying. There is some intersection w/ applied math, ML, etc., but theoretical stats has their own set of big ideas that define the field and give it character.
I think you are having a similar confusion people have sometimes about computer science and programming.
I don’t think I do? I am well aware of the famous Dijkstra’s quote.
As you mentioned, statistics is what statisticians do. Most statisticians don’t work in academia. I don’t doubt there are a lot of theory-heavy stats deparments, just like there are a lot of physics-heavy engineering departments.
Going up one meta-level, I’m less interested in what discipline boundaries have the social reality constructed, and more interested in feeling for the joint in the underlying territory.
Not sure why we are having this discussion. Statistics is a discipline with certain themes, like “intelligently using data for conclusions we want.” These themes are sufficient to give it its own character, and make it both an applied and theoretical discipline. I don’t think you are a statistician, right? Why are you talking about this?
Statistics is as much an applied discipline as physics.
You can post about whatever you want. I have objections if you start mischaracterizing what statistics is about for fun on the internet. Fun on the internet is great, being snarky on the internet is ok, misleading people is not.
edit: In fact, you can view this whole recent “data science” thing that statisticians are so worried about as a reaction to the statistics discipline becoming too theoretical and divorced from actual data analysis problems. [This is a controversial opinion, I don’t think I share it, quite.]
I don’t believe I’m mischaracterizing statistics. My original point was an observation that, in my experience, good mathematicians and good statisticians are different. Their brains work differently. To use an imperfect analogy, good C programmers and good Lisp programmers are also quite different. You just need to think in a very different manner in Lisp compared to C (and vice versa). That, of course, doesn’t mean that a C programmer can’t be passably good in Lisp.
I understand that in the academia statistics departments usually focus on theoretical statistics. That’s fine—I don’t in particular care about “official” discipline boundaries. For my purposes I would like to draw a divide between theoretical statistics and, let’s call it practical statistics. I find it useful to classify theoretical statistics as applied math, and practical statistics as something different from that.
Data science is somewhat different from traditional statistics, but I’m not sure its distinction lies on the theoretical-practical divide. As a crude approximation, I’d say that traditional statistics is mostly concerned with extracting precise and “provable” information out of small data sets, and data science tends to drown in data and so loves non-parametric models and ML in particular.
None of the nice assumptions underlying nice proofs of optimality apply.
Well, this is a matter of degree. There is a reason we use these tools in the first place. A good statistician must be quite aware of the underlying assumptions of each tool, if only so that they can switch to something else when warranted. (For instance, use “robust” methods which try to identify and appropriately discount outliers.)
A good statistician must be quite aware of the underlying assumptions of each tool
Well, of course.
and appropriately discount outliers
Heh. The word “appropriately” is a tricky one. There is a large variety of robust methods which use different ways of discounting outliers, naturally with different results. The statistician will need to figure out what’s “appropriate” in this particular case and proofs don’t help here.
Statistical tools rely on such proofs.
Statistics is an applied science, similar to engineering. It has to deal with the messy world where you might need to draw conclusions from a small data set of uncertain provenance where some outliers might be data entry mistakes (or maybe not), you are uncertain of the shape of the distributions you are dealing with, have a sneaking suspicion that the underlying process is not stable in time, etc. etc. None of the nice assumptions underlying nice proofs of optimality apply. You still need to analyse this data set.
Except for all that pesky theoretical statistics.
Math people can have that :-) It is, basically, applied math, anyway.
Except it’s not math. Disciplines are socially constructed, statistics is what statisticians do. Applied math is what applied math people do. There are lots of very theoretical stats departments. I think you are having a similar confusion people have sometimes about computer science and programming.
I think if you say stuff like “well, all those people who publish in Annals of Statistics are applied math people” I am not sure what you are really saying. There is some intersection w/ applied math, ML, etc., but theoretical stats has their own set of big ideas that define the field and give it character.
I don’t think I do? I am well aware of the famous Dijkstra’s quote.
As you mentioned, statistics is what statisticians do. Most statisticians don’t work in academia. I don’t doubt there are a lot of theory-heavy stats deparments, just like there are a lot of physics-heavy engineering departments.
Going up one meta-level, I’m less interested in what discipline boundaries have the social reality constructed, and more interested in feeling for the joint in the underlying territory.
Not sure why we are having this discussion. Statistics is a discipline with certain themes, like “intelligently using data for conclusions we want.” These themes are sufficient to give it its own character, and make it both an applied and theoretical discipline. I don’t think you are a statistician, right? Why are you talking about this?
Statistics is as much an applied discipline as physics.
Because I’m interested in the subject. Do you have objections?
You can post about whatever you want. I have objections if you start mischaracterizing what statistics is about for fun on the internet. Fun on the internet is great, being snarky on the internet is ok, misleading people is not.
edit: In fact, you can view this whole recent “data science” thing that statisticians are so worried about as a reaction to the statistics discipline becoming too theoretical and divorced from actual data analysis problems. [This is a controversial opinion, I don’t think I share it, quite.]
I don’t believe I’m mischaracterizing statistics. My original point was an observation that, in my experience, good mathematicians and good statisticians are different. Their brains work differently. To use an imperfect analogy, good C programmers and good Lisp programmers are also quite different. You just need to think in a very different manner in Lisp compared to C (and vice versa). That, of course, doesn’t mean that a C programmer can’t be passably good in Lisp.
I understand that in the academia statistics departments usually focus on theoretical statistics. That’s fine—I don’t in particular care about “official” discipline boundaries. For my purposes I would like to draw a divide between theoretical statistics and, let’s call it practical statistics. I find it useful to classify theoretical statistics as applied math, and practical statistics as something different from that.
Data science is somewhat different from traditional statistics, but I’m not sure its distinction lies on the theoretical-practical divide. As a crude approximation, I’d say that traditional statistics is mostly concerned with extracting precise and “provable” information out of small data sets, and data science tends to drown in data and so loves non-parametric models and ML in particular.
Ok, I am not interested in wasting more time on this, all I am saying is:
This is misleading. Theoretical statistics is not applied math, either. I think you don’t know what you are talking about, re: this subject.
So we disagree :-)
Well, this is a matter of degree. There is a reason we use these tools in the first place. A good statistician must be quite aware of the underlying assumptions of each tool, if only so that they can switch to something else when warranted. (For instance, use “robust” methods which try to identify and appropriately discount outliers.)
Well, of course.
Heh. The word “appropriately” is a tricky one. There is a large variety of robust methods which use different ways of discounting outliers, naturally with different results. The statistician will need to figure out what’s “appropriate” in this particular case and proofs don’t help here.