Beware technological wonderland, or, why text will dominate the future of communication and the Internet

Disclaimer: The views expressed here are speculative. I don’t have a claim to expertise in this area. I welcome pushback and anticipate there’s a reasonable chance I’ll change my mind in light of new considerations.

One of the interesting ways that many 20th century forecasts made of the future went wrong is that they posited huge physical changes in the way life was organized. For instance, they posited huge changes in these dimensions:

  • The home living arrangements of people. Smart homes and robots were routinely foreseen over time horizons where progress towards those ends would later turn out to be negligible.

  • Overoptimistic as well as overpessimistic scenarios of energy sources merged in strange ways. People believed the world would run out of oil by now, but at the same time envisioned nuclear-powered flight and home electricity.

  • Overoptimistic visions of travel: People thought humans would be sending out regular manned missions to the solar system planets, and space colonization would be on the agenda by now.

  • The types of products that would be manufactured. New products ranging from synthetic meat to room temperature superconductors were routinely prophesied to happen in the near future. Some of them may still happen, but they’ll take a lot longer than people had optimistically expected.

At the same time, they underestimated to quite an extent the informational changes in the world:

  • With the exception of forecasters specifically studying computing trends, most missed the dramatic growth of computing and the advent of the Internet and World Wide Web.

  • Most people didn’t appreciate the extent of the information and communication revolution and how it would coexist with a world that looked physically indinstinguishable from the world of 30 years ago. Note that I’m looking here at the most advanced First World places, and ignoring the point that many places (particularly in China) have experienced huge physical changes as a result of catch-up growth.

My LessWrong post on megamistakes discusses these themes somewhat in #1 (the technological wonderland and timing point) and #2 (the exceptional case of computing).

What about predictions within the informational realm? I detect a similar bias. It seems that prognosticators and forecasters tend to give undue weight to heavyweight technologies (such as 3D videoconferencing) and ignore the fact that the bulk of the production and innovation has been focused on text (with a little bit in images to augment and interweave with the text), and, to a somewhat lesser extent, images. In this article, I lay the pro-text position. I don’t have high confidence in the views expressed here, and I look forward to critical pushback that changes my mind.

Text: easier to produce

One great thing about text is its lower production costs. To the extent that production is quantitatively little and dominated by a few big players, high-quality video and audio play an important role. But as the Internet “democratizes” content production, it’s a lot easier for a lot of people to contribute text than to contribute audio or video content.

Some advantages of text from the creation perspective:

  • It’s far easier to edit and refine. This is a particularly big issue because with audio and video, you need to rehearse, do retakes, or do heavy editing in order to make something coherent come out. The barriers to text are lower.

  • It’s easier to upload and store. Text takes less space. Uploading it to a network or sending it to a friend takes less bandwidth.

  • People are (rightly or wrongly) less concerned about putting their best foot forward with text. People often spend a lot of time selecting their very best photos, even for low-stakes situations like social networks. With text, they are relatively less inhibited, because no individual piece of text represents them as persons as much as they consider their physical appearance or mannerisms to. This allows people to create a lot more text. Note that Snapchat may be an exception that proves the rule: people flocked to it because its impermanence made them less inhibited about sharing. But its impermanence also means it does not add to the stock of Internet content. And it’s still images, not videos.

  • It’s easy to copy and paste.

  • As an ergonomic matter, typing all day long, although fatiguing, consumes less energy than talking all day long.

  • Text can be created in fits and bursts. An audio or video needs to be recorded more or less in a continuous sitting.

  • You can’t play background music while having a video conversation or recording audio or video content.

Text: easier to consume and share

Text is also easier to consume and share.

  • Standardization of format and display methods makes the consumption experience similar across devices.

  • Low storage and bandwidth costs make it easy to consume over poor Internet connections and on a range of devices.

  • Text can be read at the user’s own pace. People who are slow at grasping the content can take time. People who are fast can read very quickly.

  • Text can be copied, pasted, modified, and reshared with relative ease.

  • Text is easier to search (this refers both to searching within a given piece of text and to locating a text based on some part of it or some attributes of it).

  • You can’t play background music while consuming audio-based content, but you can do it while consuming text.

  • Text can more easily be translated to other languages.

On the flip side, reading text requires you to have your eyes glued to the screen, which reduces your flexibility of movement. But because you can take breaks at your will, it’s not a big issue. Audiobooks do offer the advantage that you can move around (e.g., cook in the kitchen) while listening, and some people who work from home are quite fond of audiobooks for that purpose. In general, the benefits of text seem to outweigh the costs.

Text generates more flow-through effects

Holding willingness to pay on the part of consumers the same, text-based content is likely to generate greater flow-through effects because of its ability to foster more discussion and criticism and to be modified and reused for other purposes. This is related to the point that video and audio consumption on the Internet generally tends to substitute for TV and cinema trips, which are largely pure consumption rather than intermediate steps to further production. Text, on the other hand, has a bigger role in work-related stuff.

Augmented text

When I say that text plays a major role, I don’t mean that long ASCII strings are the be-all-and-end-all of computing and the Internet. Rather, more creative and innovative ways of interweaving a richer set of expressive and semantically powerful symbols in text is very important to harnessing its full power. It really is a lot different to read The New York Times in HTML than it would be to read the plain text of the article on a monochrome screen. The presence of hyperlinks, share buttons, the occasional image, sidebars with more related content, etc. add a lot of value.

Consider Facebook posts. These are text-based, but they allow text to be augmented in many ways:

  • Inline weblinks are automatically hyperlinked when you submit the post (though at present it’s not possible to edit the anchor text to show something different from the weblink).

  • Hashtags can be used, and link to auto-generated Facebook pages listing recent uses of the hashtag.

  • One can tag friends and Facebook groups and pages, subject to some restrictions. For friends tagged, the anchor text can be shortened to any one word in their name.

  • One can attach links, photos, and files of some types. By default, the first weblink that one uses in the post is automatically attached, though this setting can be overridden. The attached link includes a title, summary, and thumbnail.

  • One can set a location for the post.

  • One can set the timing of publication of a post.

  • Smileys are automatically rendered when the post is published.

  • It’s possible to edit the post later and make changes (except to attachments?). People can see the entire edit history.

  • One can promote one’s own post at a cost.

  • One can delete the post.

  • One can decide who is allowed to view the post (and also restrict who can comment on the post).

  • One can identify who one is with at the time of posting.

  • One can add a rich set of “verbs” to specify what one is doing.

Consider the actions that people reading the posts can perform:

  • Like the post.

  • Comment on the post. Comments automatically include link previews, and they can also be edited later (with edit histories available). Comments can also be used to share photos.

  • Share the post.

  • Select the option to get notifications on updates (such as further comments) on the post.

  • Like comments on the post.

  • Report posts or mark them as spam.

  • View the edit history of the post and comments.

  • For posts with restrictions on who can view them, see who can view the post.

  • View a list of others who re-shared the post.

If you think about it, this system, although it basically relies on text, has augmented text in a lot of ways with the intent of facilitating more meaningful communication. You may find some of the augmentations of little use to you, but each feature probably has at least a few hundred thousand people who greatly benefit from it. (If nobody uses a feature, Facebook axes it).

I suspect that the world in ten years from now will feature text that is richly augmented relative to how text is now in a similar manner that the text of today is richly augmented compared to what it was back in 2006. Unfortunately, I can’t predict any very specific innovations (if I could, I’d be busy programming them, not writing a post on LessWrong). And it might very well be the case that the low-hanging fruit with respect to augmenting text is already taken.

Why didn’t all the text augmentation happen at once? None of the augmentations are hard to program in principle. The probable reasons are:

  • Training users: The augmented text features need a loyal userbase that supports and implements them. So each augmentation needs to be introduced gradually in order to give users onboarding time. Even if Facebook in 2006 knew exactly what features they would eventually have in 2014, and even if they could code all the features in 2006, introducing them all at once might scare users because of the dramatic increase in complexity.

  • Deeper insight into what features are actually desirable: One can come up with a huge list of features and augmentations of text that might in principle be desirable, but only a small fraction of them pass a cost-benefit analysis (where the cost is the increased complexity of user interface). Discovering what features work is often a matter of trial and error.

  • Performance in terms of speed and reliability: Each augmentation adds an extra layer of code, reducing the performance in terms of speed and reliability. As computers and software have gotten faster and more powerful, and the Internet companies’ revenue has increased (giving them more leeway to spend more for server space), investments in these have become more worthwhile.

  • Focus on userbase growth: Companies were spending their resources in growing their userbase rather than adding features. Note that this is the main point that is likely to change soon: the userbase is within an order of magnitude of being the whole world population.

Images

Images play an important role along with text. Indeed, websites such as 9GAG rely on images, and others like Buzzfeed heavily mix texts and images.

I think images will continue to grow in importance on the Internet. But the vision of images as it is likely to unfold is probably quite different from the vision as futurists generally envisage. We’re not talking of a future dominated by professionally done (or even amateurly done) 16 megapixel photography. Rather, we’re talking of images that are used to convey basic information or make a memetic point. Consider that many of the most widely shared images are the standard images for memes. The number of meme images is much smaller than the number of meme pictures. Meme creators just use a standard image and their own contribution is the text at the top and bottom of the meme. Thus, even while the Internet uses images, the production at the margin largely involves text. The picture is scaffolding. Webcomics (I’m personally most familiar with SMBC and XKCD, but there are other more popular ones) are at the more professional end, but they too illustrate a similar point: it’s often the value of the ideas being creatively expressed, rather than the realism of the imagery, that delivers value.

One trend that was big in the early days of the Internet, then died down, and now seems to be reviving is the animated GIF. Animated GIFs allow people to convey simple ideas that cannot be captured in still images, without having to create a video. They also use a lot less bandwidth for consumers and web hosts than videos. Again, we see that the future is about economically using simple representations to convey ideas or memes rather than technologically awesome photography.

Quantitative estimates

Here’s what Martin Hilbert wrote in How Much Information is There in the “Information Society” (p. 3):

It is interesting to observe that the kind of content has not changed significantly since the analog age: despite the general perception that the digital age is synonymous with the proliferation of media-rich audio and videos, we find that text and still images capture a larger share of the world’s technological memories than before the digital age.5 In the early 1990s, video represented more than 80 % of the world’s information stock (mainly stored in analog VHS cassettes) and audio almost 15 % (audio cassettes and vinyl records). By 2007, the share of video in the world’s storage devices decreased to 60 % and the share of audio to merely 5 %, while text increased from less than 1 % to a staggering 20 % (boosted by the vast amounts of alphanumerical content on internet servers, hard-disks and databases. The multi-media age actually turns out to be an alphanumeric text age, which is good news if you want to make life easy for search engines.

I had come across this quote as part of a preliminary investigation for MIRI into the world’s distribution of computation (though I had not highlighted the quote in the investigation since it was relatively less important to the investigation). As another data point, Facebook claims that it needed 700 TB (as of October 2013) to store all the text-based status updates and comments plus relevant semantic information on users that would be indexed by Facebook Graph Search once it was extended to posts and comments. Contrast this with a few petabytes of storage needed for all their photos (see also here), despite the fact that one photo takes up a lot more space than one text-based update.

Beautiful text

The Internet looks a lot more beautiful today than it did ten years ago. Why? Small, incremental changes in the way that text is displayed have played a role. New fonts, new WordPress themes, a new Wikipedia or Facebook layout, all conspire to provide a combination of greater usability and greater aesthetic appeal. Also, as processors and bandwidth have improved, some layouts that may have been impractical earlier have been made possible. The block tile layout for websites has caught on quite a bit, inspired by an attempt to create a unified smooth browsing experience across a range of different devices (from small iPhone screens to large monitors used by programmers and data analysts).

Notice that it’s the versatility of text that allowed it to be upgraded. Videos created an old way would have to be redone in order to avail of new display technologies. But since text is stored as text, it can be rendered in a new font easily.

The wonders of machine learning

I’ve noticed personally, and some friends have remarked to me, that Google Search, GMail, and Facebook have gotten a lot better in recent years in many small incremental ways despite no big leaps in the overall layout and functioning of the services. Facebook shows more relevant ads, makes better friend suggestions, and has a much more relevant news feed. Google Search is scarily good at autocompletion. GMail search is improving at autocompletion too, and the interface continues to improve. Many of these improvements are the results of continuous incremental improvement, but there’s some reason to believe that the more recent changes are driven in part by application of the wonders of machine learning (see here and here for instance).

Futurists tend to think of the benefits of machine learning in terms of qualitatively new technologies, such as image recognition, video recognition, object recognition, audio transcription, etc. And these are likely to happen, eventually. But my intuition is that futurists underestimate the proportion of the value from machine learning that is intermediated through improvement in the existing interfaces that people already use (and that high-productivity people use more than average), such as their Facebook news feed or GMail or Google Search.

A place for video

Video will continue to be good for many purposes. The watching of movies will continue to migrate from TV and the cinema hall to the Internet, and the quantity watched may also increase because people have to spend less in money and time costs. Educational and entertainment videos will continue to be watched in increasing numbers. Note that these effects are largely in terms of substitution of one medium, plus a raw increase in quantity, for another rather than paradigm shifts in the nature of people’s activities.

Video chatting, through tools such as Skype or Google Talk/​Hangouts, will probably continue to grow. These will serve as important complements to text-based communication. People do want to see their friends’ faces from time to time, even if they carry out the bulk of their conversation in text. As Internet speeds improve around the world, the trivial inconveniences in the way of video communication will reduce.

But these will not drive the bulk of people’s value-added from having computing devices or being connected to the Internet. And they will in particular be an even smaller fraction of the value-added for the most productive people or the activities with maximum flow-through effects. Simply put, video just doesn’t deliver higher information per unit bandwidth and human inconvenience.

Progress in video may be similar to progress in memes and animated GIFs: there may be more use of animation to quickly create videos expressing simple ideas. Animated video hasn’t taken off yet. Xtranormal shut down. The RSA Animate style made waves in some circles, but hasn’t caught on widely. It may be that the code for simple video creation hasn’t yet been cracked. Or it may be that if people are bothering to watch video, they might as well watch something that delivers video’s unique benefits, and animated video offers little advantage over text, memes, animated GIFs, and webcomics. This remains to be seen. I’ve also heard of Vine (a service owned by Twitter for sharing very short videos), and that might be another direction for video growth, but I don’t know enough about Vine to comment.

What about 3D video?

High definition video has made good progress in relative terms, as cameras, Internet bandwidth, and computer video playing abilities have improved. It’ll be increasingly common to watch high definition videos on one’s computer screen or (for those who can afford it) on a large flatscreen TV.

What about 3D video? If full-blown 3D video could magically appear all of a sudden with a low-cost implementation for both creators and consumers, I believe it would be a smashing success. In practice, however, the path to getting there would be more tortuous. And the relevant question is whether intermediate milestones in that direction would be rewarding enough to producers and consumers to make the investments worth it. I doubt that they would, which is why it seems to me that, despite the fact that a lot of 3D video stuff is technically feasible today, it will still probably take several decades (I’m guessing at least 20 years, probably more than 30 years) to become one of the standard methods of producing and consuming content. For it to even begin, it’s necessary that improvements in hardware continue apace to the point that initial big investments in 3D video start becoming worthwhile. And then, once started, we need an ever-growing market to incentivize successive investments in improving the price-performance tradeoff (see #4 in my earlier article on supply, demand, and technological progress). Note also that there may be a gap of a few years, perhaps even a decade or more, between 3D video becoming mainstream for big budget productions (such as movies) and 3D video being common for Skype or Google Hangouts or their equivalent in the later era.

Fractional value estimates

I recently asked my Facebook friends for their thoughts on the fraction of the value they derived from the Internet that was attributable to the ability to play and download videos. I received some interesting comments there that helped confirm initial aspects of my hypothesis. I would welcome thoughts from LessWrongers on the question.

Thanks to some of my Facebook friends who commented on the thread and offered their thoughts on parts of this draft via private messaging.