Kurzweil’s predictions: good accuracy, poor self-calibration

Predictions of the future rely, to a much greater extent than in most fields, on the personal judgement of the expert making them. Just one problem—personal expert judgement generally sucks, especially when the experts don’t receive immediate feedback on their hits and misses. Formal models perform better than experts, but when talking about unprecedented future events such as nanotechnology or AI, the choice of the model is also dependent on expert judgement.

Ray Kurzweil has a model of technological intelligence development where, broadly speaking, evolution, pre-computer technological development, post-computer technological development and future AIs all fit into the same exponential increase. When assessing the validity of that model, we could look at Kurzweil’s credentials, and maybe compare them with those of his critics—but Kurzweil has given us something even better than credentials, and that’s a track record. In various books, he’s made predictions about what would happen in 2009, and we’re now in a position to judge their accuracy. I haven’t been satisfied by the various accuracy ratings I’ve found online, so I decided to do my own.

Some have argued that we should penalise predictions that “lack originality” or were “anticipated by many sources”. But hindsight bias means that we certainly judge many profoundly revolutionary past ideas as “unoriginal”, simply because they are obvious today. And saying that other sources anticipated the ideas is worthless unless we can quantify how mainstream and believable those sources were. For these reasons, I’ll focus only on the accuracy of the predictions, and make no judgement as to their ease or difficulty (unless they say things that were already true when the prediction was made).

Conversely, I won’t be giving any credit for “near misses”: this has the hindsight problem in the other direction, where we fit potentially ambiguous predictions to what we know happened. I’ll be strict about the meaning of the prediction, as written. A prediction in a published book is a form of communication, so if Kurzweil actually meant something different to what was written, then the fault is entirely his for not spelling it out unambiguously.

One exception to that strictness: I’ll be tolerant on the timeline, as I feel that a lot of the predictions were forced into a “ten years from 1999” format. So I’ll estimate the prediction accurate if it happened at any point up to the end of 2011, if data is available.

The number of predictions actually made seem to vary from source to source; I used my copy of “The Age of Spiritual Machines”, which seems to be the original 1999 edition. In the chapter “2009″, I counted 63 prediction paragraphs. I then chose ten numbers at random between 1 and 63, and analysed those ten predictions for correctness (those wanting to skip directly to the final score can scroll down). Seeing Kurzweil’s nationality and location, I will assume all prediction refer only to technologically advanced nations, and specifically to the United States if there is any doubt. Please feel free to comment on my judgements below; we may be able to build a Less Wrong consensus verdict. It would be best if you tried to reach your own conclusions before reading my verdict or anyone else’s. Hence I present the ten predictions, initially without commentary:

  • Prediction 5: Cables are disappearing. Communication between components, such as pointing devices, microphones, displays, printers and the occasional keyboard, uses short-distance wireless technology.

  • Prediction 7: The majority of text is created using continuous speech recognition (CSR) dictation software, but keyboards are still used. CSR is very accurate, far more so than the human transcriptionists who were used up until a few years ago.

  • Prediction 8: Also ubiquitous are language user interfaces (LUIs) which combine CSR and natural language recognition. For routine matters, such as simple business transactions and information inquiries, LUIs are quite responsive and precise. They tend to be narrowly focused, however, on specific types of tasks. LUIs are frequently combined with animated personalities. Interacting with an animated personality to conduct a purchase or make a reservation is like talking to a person using video conferencing, except the person is simulated.

  • Prediction 18: In the twentieth century, computers in schools were mostly on the trailing edge, with most effective learning from computers taking place in the home. Now in 2009, while schools are still not on the cutting edge, the profound importance of the computer as a knowledge tool is widely recognised. Computers play a central role in all facets of education, as they do in other spheres of life.

  • Prediction 20: Students of all ages typically have a computer of their own, which is a thin tabletlike device weighing under a pound with a very high resolution display suitable for reading. Students interact with their computers primarily by voice and by pointing with a device that looks like a pencil. Keyboards still exist, but most textual language is created by speaking. Learning materials are accessed through wireless communication.

  • Prediction 26: Print-to-speech reading devices for the blind are now very small, inexpensive, palm-sized devices that can read books (those that still exist in paper form) and other printed documents, and other real-world text such as signs and displays. These reading systems are equally adept at reading the trillions of electronic documents that are instantly available from the ubiquitous wireless worldwide network.

  • Prediction 29: Computer-controlled orthotic devices have been introduced. These “walking machines” enable paraplegics to walk and climb stairs. The prosthetic devices are not yet usable by all paraplegic persons, as many physically disabled persons have dysfunctional joints from years of disuse. However, the advent of orthotic walking systems is providing more motivations to have these joints replaced.

  • Prediction 44: Intelligent roads are in use, mainly for long-distance travel. Once your car’s guidance system locks into the control sensors on one these highways, you can sit back and relax. Local roads, though, are still predominantly conventional.

  • Prediction 48: There is continuing concern with an underclass that the skill ladder has left far behind. The size of the underclass appears to be stable, however. Although not politically popular, the underclass is politically neutralised through public assistance and the generally high level of affluence.

  • Prediction 53: Beyond musical recordings, images, and movie videos, the most popular type of digital entertainment object is virtual experience software. These interactive virtual environments allow you to go whitewater rafting on virtual rivers, to hang-glide in a virtual Grand Canyon, or to engage in intimate encounters with your favourite movie star. Users also experience fantasy environments with no counterpart in the physical world. The visual and auditory experience of virtual reality is compelling, but tactile interaction is still limited.

Verdict

    My scale for judging the predictions is: true, weakly true, weakly false, false.

    Prediction 5: My office and the computer I’m typing on seem pretty full of cables. Nevertheless, it is true there has been a rise in wireless technology, and wireless computer components, even if they’re not ubiquitous. I’ll grade this as a weakly true.

    Prediction 7: I have failed to find proper data for the first prediction. Anecdotally, it certainly seems false—keyboards are still in ubiquitous use, and I’ve never personally seen anyone use voice recognition to write documents of any length or even to send texts (a few personal experiments with Siri notwithstanding). The second claim in false: according to an assessment by the National Institute of Standards and Technology, the accuracy of CSR is still nowhere near surpassing human transcription. This leads extra credence to the first claim being false as well: without the diminished error rate, it’s very hard to see CSR being used for the majority of text creation. False.

    Prediction 8: Apart from the belief that the animated personality would be visual, this is a near-perfect description of Siri and similar assistants. The term “ubiquitous” is tricky, but if we interpret it to mean “to be found everywhere” (rather than “everyone has one”), then the prediction is weakly true (knocked down from true because of the uncertainty about ubiquity).

    Prediction 18: Without needing to do the research, I think we can take this claim as evidently true.

    Prediction 20: All the stuff about voice recognition is false. The only device that fits that description today is the smartphone, which has not achieved penetration of more than 50% among teenagers in 2011 (teenagers are the median “students of all ages”; adding in university students as well as pre-teens should lower the proportion, not raise it). “Learning materials are accessed through wireless communication” is hard to interpret, as it doesn’t give any estimate to what proportion of learning material we are talking about. So though we can give Kurzweil kudos for imagining something like the smartphone, the prediction is weakly false.

    Prediction 26: One can quibble about inexpensive, as the products seem to be in the $600 range, but those products certainly exist for book and magazine reading (though not for most signs and displays, as far as I can tell—certainly not in a form the blind can use). The second sentence is true for some screen readers, making the prediction essentially true.

    Prediction 29: 2009 timeline wrong, but true in later years.

    Prediction 44: The relative quantifier in the last sentence (“though, are still predominantly conventional”) makes it clear that we should expect intelligent highways to be common among long-distance highways—this isn’t a few experimental roads we’re talking about. Though we have a few self-driving cars, we have nothing like the intelligent roads implied in this prediction, which specifically implies that most cars on those roads will be self-driven. False.

    Prediction 48: The first part of the prediction is true. The second sentence seems false, whether one measures the underclass through relative income (where inequality has been increasing) or through an absolute standard of educational attainment (where the various graduating rates have gone up, implying the underclass is decreasing). There are other ways one could measure the underclass, giving different results. Since one could read the underclass as increasing or decreasing, should we take Kurzweil’s claim that it is stable as the correct mean? No. All that means is that had he spelt out his claim in more detail at the time, it would likely have ended up false. Ambiguity does not make a false statement true. The last sentence is virtually impossible to confirm or infirm, so the whole prediction is weakly true and weakly false.

    Prediction 53: This is a tricky one. The Wii and similar game consoles seem to fit the bill to some extent. However the tone suggests he is talking about a virtual reality experience, which is not what we currently have. So, does he mean virtual reality, or does he mean “games like what they had in 1999, except with much better graphics and features”? How would someone at the time have read the prediction? Again, ambiguity cannot be used to make a false statement true. I’m going to work on the assumption that had he merely meant “graphics and features of video games will improve a lot”, he would have said so (certainly his prediction seems to promise much more than that). So the prediction is false.

    But what if he was talking about modern games? For a start, his initial sentence gets the relative size of the industries wrong (though that can be read as a throw-away statement rather than a prediction). He also doesn’t consider things like Facebook games, which make up a large part of the games industry, and are certainly not interactive virtual environments. What about “these virtual environments allow...”? Well, the statement is possibly an utter triviality, claiming that games exist which feature rafting, hang-gliding or erotic situations (that was already true in 1999). Or it claims that features like these are a major component of the most most popular games today, which is false (now, if he’d said “blowing things up with a marvellous amount of weapons...”). Fantasy environment is a much more common feature, so, I’m taking that as correct. Under this interpretation, the prediction is weakly true and weakly false for games. In total, reading the statement either way, I’ll classify it as (contentiously) weakly false.

    Note: I did read Kurzweil’s assessment of his own predictions, after I had conducted my own analysis. In that assessment, nearly every ambiguous clause is interpreted in Kurzweil’s favour. This could be Kurzweil twisting the predictions in his direction; it could be a blatant example of hindsight bias; or it could be that what Kurzweil meant to say was different from what he wrote. Unfortunately, there is no way for us to tell, so we must make do with what was written and interpret it as best we can.

    Analysis

      So, out of the ten predictions, five are to some extent true, four are to some extent false, and one is unclassifiable (reading through the rest of the predictions, completely informally, these proportions seem roughly correct).

      Now imagine Kurzweil as a predictor who gives predictions, each with independent probability p of bring true (alternately, assume that a fixed proportion p of the 63 predictions are true, and pretend 63 is high enough that we can treat p as continuous without much loss). If we start with a uniform prior on p between 0 and 1, then we can update given this data. Model prediction 48 as true or false with equal probability. Then the posterior must be proportional to (1-p)5p5 + (1-p)4p6:

      This has a mean above 54%, which I’d say is excellent. A prediction record over 50% for a decade that included huge increases in computer power, September 11th and the great recession is intuitively a very good one. Alas there is no central repository of prediction records from various futurists, but in the absence of that, his track record certainly feels impressive. Don’t let the hindsight bias blind you to how hard this was, and don’t simply think of every prediction as binary: generally, there are far more ways for a prediction to be false than there are for them to be true.

      On the other hand, if we look at Kurzweil’s own ranking of the predictions he gave in the “Age of Spiritual Machines”, he grades himself as having either 102 out of 108 or 127 out of 147 correct (with caveats that “even the predictions that were considered ‘wrong’ in this report were not all wrong”). I’ve plotted the lower 127/​147≈0.86 accuracy on the above graph; that is very far from being a mean estimate (it’s in the 99th percentile of the probability distribution). But let’s give Kurzweil all we can: we’ll reclassify the arguable prediction 53 as being true (posterior proportional to (1-p)4p6 + (1-p)3p7):

      That is still not enough to make his accuracy estimate reasonable: his estimate is in the 96th percentile of the probability distribution. Let’s be even more generous: let’s reclassify the intermediate prediction 48 as also being true (posterior proportional to (1-p)3p7):

      Those were very generous adjustments; changing two results is a lot from a sample of ten. But even with the most generous adjustments and taking Kurzweil’s lowest estimate of his own accuracy, he is still extraordinarily overconfident: his estimate is in the 94th percentile of the probability distribution. For fun, I flipped another prediction from false to true: even then, his estimate is in the 81th percentile of the probability distribution (and recall that if we were rigorous about the timeline that Kurzweil claimed, at least one of the true prediction would be false).

      So what can this tell us about Kurzweil as a futurist, and about the predictions he makes? Essentially two points stand out:

      1. He’s most likely good at predicting.

      2. He’s most likely overconfident, reluctant to admit his misses, and hence unlikely to update on his failures.

      So I feel we should take Kurzweil’s predictions as a good baseline, with much wider error bars and caveats, paying relatively less attention to those areas where we feel that being a good Bayesian updater becomes important. We should thus probably pay more attention to his models than to his interpretation of his models.