passive_fist comments on Open thread, Sep. 28 - Oct. 4, 2015

passive_fist 1 Oct 2015 7:50 UTC
0 points

I am sorry, I do nothing of that sort.

Again, yes you are, because you’re asking about inferring some property (the hiatus e.g. relative slowdown in increase of global surface temperatures) from the data, not directly about the data (which is only a function mapping points in time to instantaneous temperature recordings and by itself says nothing about trends). One way of calculating a trend is simply smoothing/windowing and taking the derivative, and then saying ‘a hiatus is happening if the derivative is this close to zero’. That is a kind of inference, although not the kind that I would personally use for data like this.

What you are talking about is also probabilistic inference in the strictest sense, because the confidence in your estimate of existence of the hiatus depends directly on how much data you have. In this case, only a few years’ worth—if you had 100 years’ worth of data to go on, a much stronger estimate could be made. Conversely, if you had only 1-2 years of data, then no such hiatus would be ‘apparent’ even if it was occurring.
- Lumifer 1 Oct 2015 14:54 UTC
  0 points
  Parent
  To start with, there is some confusion—you say
  
  you’re asking about inferring some property
  
  which isn’t so. You are asking about inferring some property, and I’m asking about the meaning of the words you are using.
  
  However, getting to the meat of the issue, I’d like to make two points.
  
  Point one is distinguishing between sample statistics and estimates of the parameters of the underlying process. In our case we have an underlying process (warming, let’s say we define it as the net energy balance of the planet integrated over a suitable interval) which we cannot observe directly, and some data (land and ocean temperatures) which we can.
  
  The data that we have is, in statistical terminology, a sample and we commonly try to figure out properties of the underlying process by looking at the sample that we have. The thing is, sample statistics are not random. If I have some data (e.g. a time series of temperatures) and I calculate its mean, that mean is not a random variable. The probability of it is 1 -- we observed it, it happened. There is no inference involved in calculating sample means, just straight math. Now, if you want estimates of a mean of the underlying process, that’s a different issue. It’s going to be an uncertain estimate and we will have to specify some sort of a model to even produce such and estimate and talk about how likely it is.
  
  In this case, when I’m talking about the hiatus as a feature of the data, it’s not a probabilistic, there is nothing to infer. But if you want to know whether there is a hiatus in the underlying process of global warming, it’s a different question and much more complicated, too.
  
  Point two is more general and a bit more interesting. It’s common to think in terms of data and models: you have some data and you fit some models to it. You can describe your data without using any models—for example, calculate the sample mean. However as your description of data grows more complex, at some point you cross a (fuzzy) line and start to talk about the same data in terms of models, implied or explicit. Where that fuzzy line is located is subject to debate. For example, you put that line almost at the end of the spectrum when you say that the only thing we can say about a time series without involving models or inferences is that x=f(t) and that’s all. I find that not very useful and my line is further away. I’m not claiming any kind of precision here, but a full-blown ARIMA representation of a time series I would call a model, and something like an AR(1) coefficient would be right on the boundary: is it just a straightforward math calculation, or are you fitting an autoregressive model to the time series?