AI prediction case study 1: The original Dartmouth Conference

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled “The errors, insights and lessons of famous AI predictions and what they mean for the future” to the conference proceedings of the AGI12/AGI Impacts Winter Intelligence conference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we’ll present it here after the fact.

As this is the first case study, it will also introduce the paper’s prediction classification shemas.

Taxonomy of predictions

Prediction types

There will never be a bigger plane built.

Boeing engineer on the 247, a twin engine plane that held ten people.

A fortune teller talking about celebrity couples, a scientist predicting the outcome of an experiment, an economist pronouncing on next year’s GDP figures—these are canonical examples of predictions. There are other types of predictions, though. Conditional statements—if X happens, then so will Y—are also valid, narrower, predictions. Impossibility results are also a form of prediction. For instance, the law of conservation of energy gives a very broad prediction about every single perpetual machine ever made: to wit, that they will never work.

The common thread is that all these predictions constrain expectations of the future. If one takes the prediction to be true, one expects to see different outcomes than if one takes it to be false. This is closely related to Popper’s notion of falsifiability (Pop). This paper will take every falsifiable statement about future AI to be a prediction.

For the present analysis, predictions about AI will be divided into four types:

Timelines and outcome predictions. These are the traditional types of predictions, giving the dates of specific AI milestones. Examples: An AI will pass the Turing test by 2000 (Tur50); Within a decade, AIs will be replacing scientists and other thinking professions (Hal11).
Scenarios. These are a type of conditional predictions, claiming that if the conditions of the scenario are met, then certain types of outcomes will follow. Example: If someone builds a human-level AI that is easy to copy and cheap to run, this will cause mass unemployment among ordinary humans (Han94).
Plans. These are a specific type of conditional prediction, claiming that if someone decides to implement a specific plan, then they will be successful in achieving a particular goal. Example: AI can be built by scanning a human brain and simulating the scan on a computer (San08).
Issues and metastatements. This category covers relevant problems with (some or all) approaches to AI (including sheer impossibility results), and metastatements about the whole field. Examples: an AI cannot be built without a fundamental new understanding of epistemology (Deu12); Generic AIs will have certain (potentially dangerous) behaviours (Omo08).

There will inevitably be some overlap between the categories, but the division is natural enough for this paper.

Prediction methods

Just as there are many types of predictions, there are many ways of arriving at them—crystal balls, consulting experts, constructing elaborate models. An initial review of various AI predictions throughout the literature suggests the following loose schema for prediction methods (as with any such schema, the purpose is to bring clarity to the analysis, not to force every prediction into a particular box, so it should not be seen as the definitive decomposition of prediction methods):

Causal models
Non-causal models
The outside view
Philosophical arguments
Expert judgement
Non-expert judgement

Causal model are a staple of physics and the harder sciences: given certain facts about the situation under consideration (momentum, energy, charge, etc.) a conclusion is reached about what the ultimate state will be. If the facts were different, the end situation would be different.

Outside of the hard sciences, however, causal models are often a luxury, as the underlying causes are not well understood. Some success can be achieved with non-causal models: without understanding what influences what, one can extrapolate trends into the future. Moore’s law is a highly successful non-causal model (Moo65).

In the the outside view, specific examples are grouped together and claimed to be examples of the same underlying trend. This trend is used to give further predictions. For instance, one could notice the many analogues of Moore’s law across the spectrum of computing (e.g. in numbers of transistors, size of hard drives, network capacity, pixels per dollar), note that AI is in the same category, and hence argue that AI development must follow a similarly exponential curve (Kur99). Note that the use of the outside view is often implicit rather than explicit: rarely is it justified why these examples are grouped together, beyond general plausibility or similarity arguments. Hence detecting uses of the outside view will be part of the task of revealing hidden assumptions. There is evidence that the use of the outside view provides improved prediction accuracy, at least in some domains (KL93).

Philosophical arguments are common in the field of AI. Some are simple impossibility statements: AI is decreed to be impossible, using arguments of varying plausibility. More thoughtful philosophical arguments highlight problems that need to be resolved in order to achieve AI, interesting approaches for doing so, and potential issues that might emerge if AIs were to built.

Many of the predictions made by AI experts aren’t logically complete: not every premise is unarguable, not every deduction is fully rigorous. In many cases, the argument relies on the expert’s judgement to bridge these gaps. This doesn’t mean that the prediction is unreliable: in a field as challenging as AI, judgement, honed by years of related work, may be the best tool available. Non-experts cannot easily develop a good feel for the field and its subtleties, so should not confidently reject expert judgement out of hand. Relying on expert judgement has its pitfalls, however.

Finally, some predictions rely on the judgement of non-experts, or of experts making claims outside their domain of expertise. Prominent journalists, authors, CEOs, historians, physicists and mathematicians will generally be no more accurate than anyone else when talking about AI, no matter how stellar they are in their own field (Kah11).

Predictions often use a combination of these methods, as will be seen in the various case studies—expert judgement, for instance, is a common feature in all of them.

In the beginning, Dartmouth created the AI and the hype...

Classification: plan, using expert judgement and the outside view.

Hindsight bias is very strong and misleading (Fis75). Humans are often convinced that past events couldn’t have unfolded differently than how they did, and that the people at the time should have realised this. Even worse, people unconsciously edit their own memories so that they misremember themselves as being right even when they got their past predictions wrong (one of the reasons that it is important to pay attention only to the actual prediction as written at the time, and not to the author’s subsequent justifications or clarifications). Hence when assessing past predictions, one must cast aside all knowledge of subsequent events, and try to assess the claims given the knowledge available at the time. This is an invaluable exercise to undertake before turning attention to predictions whose timelines have not come to pass.

The 1956 Dartmouth Summer Research Project on Artificial Intelligence was a major conference, credited with introducing the term ″Artificial Intelligence″ and starting the research in many of its different subfields. The conference proposal, written in 1955, sets out what the organisers thought could be achieved. Its first paragraph reads:

″We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.″

This can be classified as a plan. Its main backing would have been expert judgement. The conference organisers were John McCarthy (a mathematician with experience in the mathematical nature of the thought process), Marvin Minsky (Harvard Junior Fellow in Mathematics and Neurology, and prolific user of neural nets), Nathaniel Rochester (Manager of Information Research, IBM, designer of the IBM 701, the first general purpose, mass-produced computer, and designer of the first symbolic assembler) and Claude Shannon (the ″father of information theory″). These were individuals who had been involved in a lot of related theoretical and practical work, some of whom had built functioning computers or programing languages—so one can expect them all to have had direct feedback about what was and wasn’t doable in computing. If anyone could be considered experts in AI, in a field dedicated to an as yet non-existent machine, then they could. What implicit and explicit assumptions could they have used to predict that AI would be easy?

Reading the full proposal doesn’t give the impression of excessive optimism or overconfidence. The very first paragraph hints at the rigour of their ambitions—they realised that precisely describing the features of intelligence is a major step in simulating it. Their research plan is well decomposed, and different aspects of the problem of artificial intelligence are touched upon. The authors are well aware of the inefficiency of exhaustive search methods, of the differences between informal and formal languages, and of the need for encoding creativity. They talk about the need to design machines that can work with unreliable components, and that can cope with randomness and small errors in a robust way. They propose some simple models of some of these challenges (such as forming abstractions, or dealing with more complex environments), point to some previous successful work that has been done before, and outline how further improvements can be made.

Reading through, the implicit reasons for their confidence seem to become apparent (as with any exercise in trying to identify implicit assumptions, this process is somewhat subjective. It is not meant to suggest that the authors were thinking along these lines, merely to point out factors that could explain their confidence—factors, moreover, that could have lead dispassionate analytical observers to agree with them). These were experts, some of whom had been working with computers from early days, who had a long track record of taking complex problems, creating simple (and then more complicated) models to deal with them. These models they used to generate useful insights or functioning machines. So this was an implicit use of the outside view—they were used to solving certain problems, these looked like the problems they could solve, hence they assumed they could solve them. To modern eyes, informal languages are hugely complicated, but this may not have been obvious at the time. Computers were doing tasks, such as complicated mathematical manipulations, that were considered high-skill, something only impressive humans had been capable of. Moravec’s paradox had not yet been realised (this is the principle that high-level reasoning requires very little computation, but low-level sensorimotor skills require enormous computational resources—sometimes informally expressed as ″everything easy [for a human] is hard [for a computer], everything hard is easy″). The human intuition about the relative difficulty of tasks was taken as accurate: there was no reason to suspect that parsing English was much harder than the impressive feats computer could already perform. Moreover, great progress had been made in logic, in semantics, in information theory, giving new understanding to old concepts: there was no reason to suspect that further progress wouldn’t be both forthcoming and dramatic.

Even at the time, though, one could criticise their overconfidence. Philosophers, for one, had a long track record of pointing out the complexities and subtleties of the human mind. It might have seemed plausible in 1955 that further progress in logic and information theory would end up solving all these problems—but it could have been equally plausible to suppose that the success of formal models had been on low-hanging fruit, and that further progress would become much harder. Furthermore, the computers at the time were much simpler than the human brain (e.g. the IBM 701, with 73728 bits of memory), so any assumption that AIs could be built was also an assumption that most of the human brain’s processing was wasted. This implicit assumption was not obviously wrong, but neither was it obviously right.

Hence the whole conference project would have seemed ideal, had it merely added more humility and qualifiers in the text, expressing uncertainty as to whether a particular aspect of the program might turn out to be hard or easy. After all, in 1955, there were no solid grounds for arguing that such tasks were unfeasible for a computer.

Nowadays, it is obvious that the paper’s predictions were very wrong. All the tasks mentioned were much harder to accomplish than they claimed at the time, and haven’t been successfully completed even today. Rarely have such plausible predictions turned out to be so wrong; so what can be learned from this?

The most general lesson is perhaps on the complexity of language and the danger of using human-understandable informal concepts in the field of AI. The Dartmouth group seemed convinced that because they informally understood certain concepts and could begin to capture some of this understanding in a formal model, then it must be possible to capture all this understanding in a formal model. In this, they were wrong. Similarities of features do not make the models similar to reality, and using human terms—such as ‘culture’ and ‘informal’ - in these model concealed huge complexity and gave an illusion of understanding. Today’s AI developers have a much better understanding of how complex cognition can be, and have realised that programming simple-seeming concepts into computers can be very difficult. So the main lesson to draw is that reasoning about AI using human concepts (or anthropomorphising the AIs by projecting human features onto it) is a very poor guide to the nature of the problem and the time and effort required to solve it.

References:

[Arm] Stuart Armstrong. General purpose intelligence: arguing the orthogonality thesis. In preparation.
[ASB12] Stuart Armstrong, Anders Sandberg, and Nick Bostrom. Thinking inside the box: Controlling and using an oracle ai. Minds and Machines, 22:299-324, 2012.
[BBJ+03] S. Bleich, B. Bandelow, K. Javaheripour, A. Muller, D. Degner, J. Wilhelm, U. Havemann-Reinecke, W. Sperling, E. Ruther, and J. Kornhuber. Hyperhomocysteinemia as a new risk factor for brain shrinkage in patients with alcoholism. Neuroscience Letters, 335:179-182, 2003.
[Bos13] Nick Bostrom. The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. forthcoming in Minds and Machines, 2013.
[Cre93] Daniel Crevier. AI: The Tumultuous Search for Artificial Intelligence. NY: BasicBooks, New York, 1993.
[Den91] Daniel Dennett. Consciousness Explained. Little, Brown and Co., 1991.
[Deu12] D. Deutsch. The very laws of physics imply that artificial intelligence must be possible. what’s holding us up? Aeon, 2012.
[Dre65] Hubert Dreyfus. Alchemy and ai. RAND Corporation, 1965.
[eli66] Eliza-a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9:36-45, 1966.
[Fis75] Baruch Fischho. Hindsight is not equal to foresight: The effect of outcome knowledge on judgment under uncertainty. Journal of Experimental Psychology: Human Perception and Performance, 1:288-299, 1975.
[Gui11] Erico Guizzo. IBM’s Watson jeopardy computer shuts down humans in final game. IEEE Spectrum, 17, 2011.
[Hal11] J. Hall. Further reflections on the timescale of ai. In Solomonoff 85th Memorial Conference, 2011.
[Han94] R. Hanson. What if uploads come first: The crack of a future dawn. Extropy, 6(2), 1994.
[Har01] S. Harnad. What’s wrong and right about Searle’s Chinese room argument? In M. Bishop and J. Preston, editors, Essays on Searle’s Chinese Room Argument. Oxford University Press, 2001.
[Hau85] John Haugeland. Artificial Intelligence: The Very Idea. MIT Press, Cambridge, Mass., 1985.
[Hof62] Richard Hofstadter. Anti-intellectualism in American Life. 1962.
[Kah11] D. Kahneman. Thinking, Fast and Slow. Farra, Straus and Giroux, 2011.
[KL93] Daniel Kahneman and Dan Lovallo. Timid choices and bold forecasts: A cognitive perspective on risk taking. Management science, 39:17-31, 1993.
[Kur99] R. Kurzweil. The Age of Spiritual Machines: When Computers Exceed Human Intelligence. Viking Adult, 1999.
[McC79] J. McCarthy. Ascribing mental qualities to machines. In M. Ringle, editor, Philosophical Perspectives in Artificial Intelligence. Harvester Press, 1979.
[McC04] Pamela McCorduck. Machines Who Think. A. K. Peters, Ltd., Natick, MA, 2004.
[Min84] Marvin Minsky. Afterword to Vernor Vinges novel, “True names.” Unpublished manuscript. 1984.
[Moo65] G. Moore. Cramming more components onto integrated circuits. Electronics, 38(8), 1965.
[Omo08] Stephen M. Omohundro. The basic ai drives. Frontiers in Artificial Intelligence and applications, 171:483-492, 2008.
[Pop] Karl Popper. The Logic of Scientific Discovery. Mohr Siebeck.
[Rey86] G. Rey. What’s really going on in Searle’s Chinese room”. Philosophical Studies, 50:169-185, 1986.
[Riv12] William Halse Rivers. The disappearance of useful arts. Helsingfors, 1912.
[San08] A. Sandberg. Whole brain emulations: a roadmap. Future of Humanity Institute Technical Report, 2008-3, 2008.
[Sea80] J. Searle. Minds, brains and programs. Behavioral and Brain Sciences, 3(3):417-457, 1980.
[Sea90] John Searle. Is the brain’s mind a computer program? Scientific American, 262:26-31, 1990.
[Sim55] H.A. Simon. A behavioral model of rational choice. The quarterly journal of economics, 69:99-118, 1955.
[Tur50] A. Turing. Computing machinery and intelligence. Mind, 59:433-460, 1950.
[vNM44] John von Neumann and Oskar Morgenstern. Theory of Games and Economic Behavior. Princeton, NJ, Princeton University Press, 1944.
[Wal05] Chip Walter. Kryder’s law. Scientific American, 293:32-33, 2005.
[Win71] Terry Winograd. Procedures as a representation for data in a computer program for understanding natural language. MIT AI Technical Report, 235, 1971.
[Yam12] Roman V. Yampolskiy. Leakproofing the singularity: artificial intelligence confinement problem. Journal of Consciousness Studies, 19:194-214, 2012.
[Yud08] Eliezer Yudkowsky. Artificial intelligence as a positive and negative factor in global risk. In Nick Bostrom and Milan M. Ćirković, editors, Global catastrophic risks, pages 308-345, New York, 2008. Oxford University Press.