Just like gjm, I think you’re confusing existence and interpretation.
Outside of political posturing, I don’t know why someone would claim that hiatus as a feature of the historical data set does not exist. It does and it’s pretty clear. That’s existence. What does the hiatus mean is a different and a much more complicated question. You can claim it’s just an artifact of random variation. You can claim it reflects multi-year cycles in global climate patterns. You can claim it shows that our models are deficient and we don’t understand climate variation. You can claim many things—but a claim that the hiatus just does not exist doesn’t seem reasonable to me.
Not sure why you’re using this unusual terminology, but I’m arguing about what you call existence. It seems that you’re arguing that the ‘hiatus’ exists with either absolute certainty (in which case you’d have to provide a logical proof) or at least with very high likelihood. However, I see no reason we should assign a very high likelihood to its existence.
The ‘existence’ of a ‘trend’ or ‘hiatus’ in general time series data is part of the map, not the territory. If the climate temperature data were just a smooth line (like this—graph not relevant to the discussion) then I’d agree with you, but it’s not. It looks like this.
The ‘existence’ of a ‘trend’ or ‘hiatus’ in general time series data is part of the map, not the territory.
I am not sure about that. In your “smooth line” example, is the trend part of the map or the territory? More generally, what can I say about a time series that you would consider to be territory and not map?
Oh, and if you want to be technical about it, the time series you’re looking at is not part of the territory to start with. It’s a complex model-dependent aggregate.
If the temperature graph looked like the first graph, then inference of a trend (which is, again, part of the map) with high probability might be made. But it does not look like that.
More generally, what can I say about a time series that you would consider to be territory and not map?
That f(t) = x.
Oh, and if you want to be technical about it, the time series you’re looking at is not part of the territory to start with. It’s a complex model-dependent aggregate.
For the sake of discussion of the existence of a hiatus I’m assuming the temperature graph is a given. But you’re right in that the big picture is that the temperature graph itself is not part of the territory.
In this case I am not sure what do you mean by “exists”.
Can you give a definition, preferfably a hard one, that is, an algorithm into which I can feed the time series and it will tell me whether a particular feature (e.g. a hiatus) exists or not?
You’re getting close to understanding the problem. What you’re really asking about is an inference method, and the optimal inference method is Bayesian inference, which requires specification of what you would expect to see in the temperature record if the current warming rate were zero and also the specification of a prior probability. For the latter, an uninformative prior assigning equal weight to warming and cooling would probably be most suitable here. The former is a bit tricky, and that is precisely the problem with saying “the existence of the hiatus is obvious.”
What you’re really asking about is an inference method
I am sorry, I do nothing of that sort. You asked a question about whether something exists and it turned out that you have a different meaning (or, maybe, context) for that word than I envisioned. So I am asking you what do you mean by “exists”—not about the optimal methods of inference.
Given your comment, I think what you are asking is not whether the hiatus exists (as I use the word), but rather whether the warming has stopped—or maybe whether our confidence in the current climate models is not as high as it used to be.
Again, yes you are, because you’re asking about inferring some property (the hiatus e.g. relative slowdown in increase of global surface temperatures) from the data, not directly about the data (which is only a function mapping points in time to instantaneous temperature recordings and by itself says nothing about trends). One way of calculating a trend is simply smoothing/windowing and taking the derivative, and then saying ‘a hiatus is happening if the derivative is this close to zero’. That is a kind of inference, although not the kind that I would personally use for data like this.
What you are talking about is also probabilistic inference in the strictest sense, because the confidence in your estimate of existence of the hiatus depends directly on how much data you have. In this case, only a few years’ worth—if you had 100 years’ worth of data to go on, a much stronger estimate could be made. Conversely, if you had only 1-2 years of data, then no such hiatus would be ‘apparent’ even if it was occurring.
which isn’t so. You are asking about inferring some property, and I’m asking about the meaning of the words you are using.
However, getting to the meat of the issue, I’d like to make two points.
Point one is distinguishing between sample statistics and estimates of the parameters of the underlying process. In our case we have an underlying process (warming, let’s say we define it as the net energy balance of the planet integrated over a suitable interval) which we cannot observe directly, and some data (land and ocean temperatures) which we can.
The data that we have is, in statistical terminology, a sample and we commonly try to figure out properties of the underlying process by looking at the sample that we have. The thing is, sample statistics are not random. If I have some data (e.g. a time series of temperatures) and I calculate its mean, that mean is not a random variable. The probability of it is 1 -- we observed it, it happened. There is no inference involved in calculating sample means, just straight math. Now, if you want estimates of a mean of the underlying process, that’s a different issue. It’s going to be an uncertain estimate and we will have to specify some sort of a model to even produce such and estimate and talk about how likely it is.
In this case, when I’m talking about the hiatus as a feature of the data, it’s not a probabilistic, there is nothing to infer. But if you want to know whether there is a hiatus in the underlying process of global warming, it’s a different question and much more complicated, too.
Point two is more general and a bit more interesting. It’s common to think in terms of data and models: you have some data and you fit some models to it. You can describe your data without using any models—for example, calculate the sample mean. However as your description of data grows more complex, at some point you cross a (fuzzy) line and start to talk about the same data in terms of models, implied or explicit. Where that fuzzy line is located is subject to debate. For example, you put that line almost at the end of the spectrum when you say that the only thing we can say about a time series without involving models or inferences is that x=f(t) and that’s all. I find that not very useful and my line is further away. I’m not claiming any kind of precision here, but a full-blown ARIMA representation of a time series I would call a model, and something like an AR(1) coefficient would be right on the boundary: is it just a straightforward math calculation, or are you fitting an autoregressive model to the time series?
Just like gjm, I think you’re confusing existence and interpretation.
Outside of political posturing, I don’t know why someone would claim that hiatus as a feature of the historical data set does not exist. It does and it’s pretty clear. That’s existence. What does the hiatus mean is a different and a much more complicated question. You can claim it’s just an artifact of random variation. You can claim it reflects multi-year cycles in global climate patterns. You can claim it shows that our models are deficient and we don’t understand climate variation. You can claim many things—but a claim that the hiatus just does not exist doesn’t seem reasonable to me.
Not sure why you’re using this unusual terminology, but I’m arguing about what you call existence. It seems that you’re arguing that the ‘hiatus’ exists with either absolute certainty (in which case you’d have to provide a logical proof) or at least with very high likelihood. However, I see no reason we should assign a very high likelihood to its existence.
The ‘existence’ of a ‘trend’ or ‘hiatus’ in general time series data is part of the map, not the territory. If the climate temperature data were just a smooth line (like this—graph not relevant to the discussion) then I’d agree with you, but it’s not. It looks like this.
What’s unusual about my terminology?
I am not sure about that. In your “smooth line” example, is the trend part of the map or the territory? More generally, what can I say about a time series that you would consider to be territory and not map?
Oh, and if you want to be technical about it, the time series you’re looking at is not part of the territory to start with. It’s a complex model-dependent aggregate.
If the temperature graph looked like the first graph, then inference of a trend (which is, again, part of the map) with high probability might be made. But it does not look like that.
That f(t) = x.
For the sake of discussion of the existence of a hiatus I’m assuming the temperature graph is a given. But you’re right in that the big picture is that the temperature graph itself is not part of the territory.
In this case I am not sure what do you mean by “exists”.
Can you give a definition, preferfably a hard one, that is, an algorithm into which I can feed the time series and it will tell me whether a particular feature (e.g. a hiatus) exists or not?
You’re getting close to understanding the problem. What you’re really asking about is an inference method, and the optimal inference method is Bayesian inference, which requires specification of what you would expect to see in the temperature record if the current warming rate were zero and also the specification of a prior probability. For the latter, an uninformative prior assigning equal weight to warming and cooling would probably be most suitable here. The former is a bit tricky, and that is precisely the problem with saying “the existence of the hiatus is obvious.”
I am sorry, I do nothing of that sort. You asked a question about whether something exists and it turned out that you have a different meaning (or, maybe, context) for that word than I envisioned. So I am asking you what do you mean by “exists”—not about the optimal methods of inference.
Given your comment, I think what you are asking is not whether the hiatus exists (as I use the word), but rather whether the warming has stopped—or maybe whether our confidence in the current climate models is not as high as it used to be.
Again, yes you are, because you’re asking about inferring some property (the hiatus e.g. relative slowdown in increase of global surface temperatures) from the data, not directly about the data (which is only a function mapping points in time to instantaneous temperature recordings and by itself says nothing about trends). One way of calculating a trend is simply smoothing/windowing and taking the derivative, and then saying ‘a hiatus is happening if the derivative is this close to zero’. That is a kind of inference, although not the kind that I would personally use for data like this.
What you are talking about is also probabilistic inference in the strictest sense, because the confidence in your estimate of existence of the hiatus depends directly on how much data you have. In this case, only a few years’ worth—if you had 100 years’ worth of data to go on, a much stronger estimate could be made. Conversely, if you had only 1-2 years of data, then no such hiatus would be ‘apparent’ even if it was occurring.
To start with, there is some confusion—you say
which isn’t so. You are asking about inferring some property, and I’m asking about the meaning of the words you are using.
However, getting to the meat of the issue, I’d like to make two points.
Point one is distinguishing between sample statistics and estimates of the parameters of the underlying process. In our case we have an underlying process (warming, let’s say we define it as the net energy balance of the planet integrated over a suitable interval) which we cannot observe directly, and some data (land and ocean temperatures) which we can.
The data that we have is, in statistical terminology, a sample and we commonly try to figure out properties of the underlying process by looking at the sample that we have. The thing is, sample statistics are not random. If I have some data (e.g. a time series of temperatures) and I calculate its mean, that mean is not a random variable. The probability of it is 1 -- we observed it, it happened. There is no inference involved in calculating sample means, just straight math. Now, if you want estimates of a mean of the underlying process, that’s a different issue. It’s going to be an uncertain estimate and we will have to specify some sort of a model to even produce such and estimate and talk about how likely it is.
In this case, when I’m talking about the hiatus as a feature of the data, it’s not a probabilistic, there is nothing to infer. But if you want to know whether there is a hiatus in the underlying process of global warming, it’s a different question and much more complicated, too.
Point two is more general and a bit more interesting. It’s common to think in terms of data and models: you have some data and you fit some models to it. You can describe your data without using any models—for example, calculate the sample mean. However as your description of data grows more complex, at some point you cross a (fuzzy) line and start to talk about the same data in terms of models, implied or explicit. Where that fuzzy line is located is subject to debate. For example, you put that line almost at the end of the spectrum when you say that the only thing we can say about a time series without involving models or inferences is that x=f(t) and that’s all. I find that not very useful and my line is further away. I’m not claiming any kind of precision here, but a full-blown ARIMA representation of a time series I would call a model, and something like an AR(1) coefficient would be right on the boundary: is it just a straightforward math calculation, or are you fitting an autoregressive model to the time series?