There are a variety of reasons interpreters might think that a prediction didn’t come true, while Kurzweil boldly claims that it did:
Kurzweil didn’t express himself clearly, so interpreters misunderstood what the prediction really was. Miscommunication adds random noise, and most randomly generated predictions will turn out false, so this will skew the results against Kurzweil.
Kurzweil’s prediction was vague. So charitable interpreters will think they’re basically true, while less charitable interpreters will think they’re basically false. And we can expect random LessWrongers to be less charitable toward Kurzweil than Kurzweil is toward Kurzweil.
Interpreters tend to be factually mistaken about current events, in a specific direction: They are ignorant of the nature, existence, or prevalence of the latest innovations in technology and culture.
Kurzweil tends to be factually mistaken about current events, in a specific direction: He thinks a variety of technologies are more advanced, and more widespread, than they really are.
There are systemic differences in the evaluation scales used by Kurzweil and by others. For instance, Kurzweil and Armstrong individuate ‘predictions’ differently, lumping and splitting at different points in the source text. There may also be systemic disagreements about how (temporally and technologically) precise an interpretation must be to count as ‘correct,’ and about whether grammatical forms like ‘X is Y’ most closely means ‘X is always Y’, ‘X is usually Y’, ‘X is commonly Y’, ‘X is sometimes (occasionally) Y’, or ‘X is Y at least once’. This ties into vagueness, but may bias the results due to linguistic variation rather than just as a result of generic degree of interpretive charity.
I’m particularly curious about testing 3, since the strongest criticism Kurzweil could make of our methodology for assessing his accuracy is that our reviewers simply got the facts wrong. We can calibrate our assumptions about the accuracy and up-to-dateness of LessWrongers regarding technology generally. Or more specifically we can expose them to Kurzweil’s arguments and see how much their assessment of his predictive success changes after hearing why he thinks he got a certain prediction ‘correct’.
With the advent of multi-core architectures, these devices are starting to have 2, 4, 8…
computers each in them, so we’ll exceed a dozen computers “on and around their bodies”
very soon. One could argue that it is “typical” already, but it will become very common
within a couple of years.
There’s clearly a disconnect between his ‘computer’ and the general meaning of ‘computer’; A multicore processor isn’t more than one computer, and it wasn’t in 1990.
Also, he seems to regard things as ‘typical’ that I would call ‘common’; I say ‘common’ when it isn’t surprising to see something, and ‘typical’ when it is surprising to note it’s absence, while he seems to use ‘typical’ for things which are not surprising, and ‘common’ for things which are commercially available (regardless of cost or prevalence)
I think (5.) can give a significant difference (together with 1 and 2 - I would not expect so much trouble from 3 and 4). Imagine a series of 4 statements, where the last three basically require the first one. If all 4 are correct, it is easy to check every single statement, giving 4 correct predictions. But if the first one is wrong—and the others have to be wrong as consequence—Kurzweil might count the whole block as one wrong prediction.
For predictions judged by multiple volunteers, it might be interesting to check how much they deviate from each other. This gives some insight how important (1.) to (3.) are. satt looked at that, but I don’t know which conclusion we can draw from that.
There are a variety of reasons interpreters might think that a prediction didn’t come true, while Kurzweil boldly claims that it did:
Kurzweil didn’t express himself clearly, so interpreters misunderstood what the prediction really was. Miscommunication adds random noise, and most randomly generated predictions will turn out false, so this will skew the results against Kurzweil.
Kurzweil’s prediction was vague. So charitable interpreters will think they’re basically true, while less charitable interpreters will think they’re basically false. And we can expect random LessWrongers to be less charitable toward Kurzweil than Kurzweil is toward Kurzweil.
Interpreters tend to be factually mistaken about current events, in a specific direction: They are ignorant of the nature, existence, or prevalence of the latest innovations in technology and culture.
Kurzweil tends to be factually mistaken about current events, in a specific direction: He thinks a variety of technologies are more advanced, and more widespread, than they really are.
There are systemic differences in the evaluation scales used by Kurzweil and by others. For instance, Kurzweil and Armstrong individuate ‘predictions’ differently, lumping and splitting at different points in the source text. There may also be systemic disagreements about how (temporally and technologically) precise an interpretation must be to count as ‘correct,’ and about whether grammatical forms like ‘X is Y’ most closely means ‘X is always Y’, ‘X is usually Y’, ‘X is commonly Y’, ‘X is sometimes (occasionally) Y’, or ‘X is Y at least once’. This ties into vagueness, but may bias the results due to linguistic variation rather than just as a result of generic degree of interpretive charity.
I’m particularly curious about testing 3, since the strongest criticism Kurzweil could make of our methodology for assessing his accuracy is that our reviewers simply got the facts wrong. We can calibrate our assumptions about the accuracy and up-to-dateness of LessWrongers regarding technology generally. Or more specifically we can expose them to Kurzweil’s arguments and see how much their assessment of his predictive success changes after hearing why he thinks he got a certain prediction ‘correct’.
There’s clearly a disconnect between his ‘computer’ and the general meaning of ‘computer’; A multicore processor isn’t more than one computer, and it wasn’t in 1990.
Also, he seems to regard things as ‘typical’ that I would call ‘common’; I say ‘common’ when it isn’t surprising to see something, and ‘typical’ when it is surprising to note it’s absence, while he seems to use ‘typical’ for things which are not surprising, and ‘common’ for things which are commercially available (regardless of cost or prevalence)
I think (5.) can give a significant difference (together with 1 and 2 - I would not expect so much trouble from 3 and 4). Imagine a series of 4 statements, where the last three basically require the first one. If all 4 are correct, it is easy to check every single statement, giving 4 correct predictions. But if the first one is wrong—and the others have to be wrong as consequence—Kurzweil might count the whole block as one wrong prediction.
For predictions judged by multiple volunteers, it might be interesting to check how much they deviate from each other. This gives some insight how important (1.) to (3.) are. satt looked at that, but I don’t know which conclusion we can draw from that.