I actually quite like your four dot points, as summaries of some distinguishing features of these cases. (Although with Rutherford, I’d also highlight the point about whether or not the forecast is likely to reflect genuine beliefs, and perhaps more specifically whether or not a desire to mitigate attention hazards may be playing a role.)
And I think “Too many degrees of freedom to find some reason we shouldn’t count them as “serious” predictions” gets at a good point. And I think it’s improved my thinking on this a bit.
Overall, I think that your comment would be a good critique of this post if this post was saying or implying that these case studies provide no evidence for the sorts of claims Ord and Yudkowsky want to make. But my thesis was genuinely just that “I think those cases provide less clear evidence [not no evidence] than those authors seem to suggest”. And I genuinely just aimed to “Highlight ways in which those cases may be murkier than Ord and Yudkowsky suggest” (and also separately note the sample size and representativeness points).
It wasn’t the case that I was using terms like “less clear” and “may be murkier” to be polite or harder-to-criticise (in a motte-and-bailey sort of way), while in reality I harboured or wished to imply some stronger thesis; instead, I genuinely just meant what I said. I just wanted to “prod at each suspicious plank on its own terms”, not utterly smash each suspicious plank, let alone bring the claims resting atop them crashing down.
That may also be why I didn’t touch on what you see as the true crux (though I’m not certain, as I’m not certain I know precisely what you mean by that crux). This post had a very specific, limited scope. As I noted, “this post is far from a comprehensive discussion on the efficacy, pros, cons, and best practices for long-range or technology-focused forecasting.”
To sort-of restate some things and sort-of address your points: I do think each of the cases provide some evidence in relation to the question (let’s call it Q1) “How overly ‘conservative’ (or poorly-calibrated) do experts’ quantitative forecasts of the likelihood or timelines of technology tend to be, under “normal” conditions?” I think the cases provide clearer evidence in relation to questions like how overly ‘conservative’ (or poorly-calibrated) do experts’ forecasts of the likelihood or timelines of technology tend to be, when...
it seems likelier than normal that the forecasts themselves could change likelihoods or timelines
I’m not actually sure what we’d base that on. Perhaps unusually substantial prominence or publicity of the forecaster? Perhaps a domain in which there’s a wide variety of goals that could be pursued, and which one is pursued has sometimes been decided partly to prove forecasts wrong? AI might indeed be an example; I don’t really know.
it seems likelier than normal that the forecaster isn’t actually giving their genuine forecast (and perhaps more specifically, that they’re partly aiming to mitigate attention hazards)
cutting-edge development on the relevant tech is occurring in highly secretive or militarised ways
...as well as questions about poor communication of forecasts by experts.
I think each of those questions other than Q1 are also important. And I’d agree that, in reality, we often won’t know much about how far conditions differ from “normal conditions”, or what “normal conditions” are really like (e.g., maybe forecasts are usually not genuine beliefs). These are both reasons why the “murkiness” I highlight about these cases might not be that big a deal in practice, or might do something more like drawing our attention to specific factors that should make us wary of expert predictions, rather than just making us wary in general.
In any case, I think the representativeness issue may actually be more important. As I note in footnote 4, I’d update more on these same cases (holding “murkiness” constant) if they were the first four cases drawn randomly, rather than through what I’d guess was a somewhat “biased” sampling process (which I don’t mean as a loaded/pejorative term).
I actually quite like your four dot points, as summaries of some distinguishing features of these cases. (Although with Rutherford, I’d also highlight the point about whether or not the forecast is likely to reflect genuine beliefs, and perhaps more specifically whether or not a desire to mitigate attention hazards may be playing a role.)
And I think “Too many degrees of freedom to find some reason we shouldn’t count them as “serious” predictions” gets at a good point. And I think it’s improved my thinking on this a bit.
Overall, I think that your comment would be a good critique of this post if this post was saying or implying that these case studies provide no evidence for the sorts of claims Ord and Yudkowsky want to make. But my thesis was genuinely just that “I think those cases provide less clear evidence [not no evidence] than those authors seem to suggest”. And I genuinely just aimed to “Highlight ways in which those cases may be murkier than Ord and Yudkowsky suggest” (and also separately note the sample size and representativeness points).
It wasn’t the case that I was using terms like “less clear” and “may be murkier” to be polite or harder-to-criticise (in a motte-and-bailey sort of way), while in reality I harboured or wished to imply some stronger thesis; instead, I genuinely just meant what I said. I just wanted to “prod at each suspicious plank on its own terms”, not utterly smash each suspicious plank, let alone bring the claims resting atop them crashing down.
That may also be why I didn’t touch on what you see as the true crux (though I’m not certain, as I’m not certain I know precisely what you mean by that crux). This post had a very specific, limited scope. As I noted, “this post is far from a comprehensive discussion on the efficacy, pros, cons, and best practices for long-range or technology-focused forecasting.”
To sort-of restate some things and sort-of address your points: I do think each of the cases provide some evidence in relation to the question (let’s call it Q1) “How overly ‘conservative’ (or poorly-calibrated) do experts’ quantitative forecasts of the likelihood or timelines of technology tend to be, under “normal” conditions?” I think the cases provide clearer evidence in relation to questions like how overly ‘conservative’ (or poorly-calibrated) do experts’ forecasts of the likelihood or timelines of technology tend to be, when...
it seems likelier than normal that the forecasts themselves could change likelihoods or timelines
I’m not actually sure what we’d base that on. Perhaps unusually substantial prominence or publicity of the forecaster? Perhaps a domain in which there’s a wide variety of goals that could be pursued, and which one is pursued has sometimes been decided partly to prove forecasts wrong? AI might indeed be an example; I don’t really know.
it seems likelier than normal that the forecaster isn’t actually giving their genuine forecast (and perhaps more specifically, that they’re partly aiming to mitigate attention hazards)
cutting-edge development on the relevant tech is occurring in highly secretive or militarised ways
...as well as questions about poor communication of forecasts by experts.
I think each of those questions other than Q1 are also important. And I’d agree that, in reality, we often won’t know much about how far conditions differ from “normal conditions”, or what “normal conditions” are really like (e.g., maybe forecasts are usually not genuine beliefs). These are both reasons why the “murkiness” I highlight about these cases might not be that big a deal in practice, or might do something more like drawing our attention to specific factors that should make us wary of expert predictions, rather than just making us wary in general.
In any case, I think the representativeness issue may actually be more important. As I note in footnote 4, I’d update more on these same cases (holding “murkiness” constant) if they were the first four cases drawn randomly, rather than through what I’d guess was a somewhat “biased” sampling process (which I don’t mean as a loaded/pejorative term).