SHRDLU, understanding, anthropomorphisation and hindsight bias
EDIT: Since I didn’t make it sufficiently clear, the point of this post was to illustrate how the GOFAI people could have got so much wrong and yet still be confident in their beliefs, by looking at what the results of one experiment—SHRDLU—must have felt like to those developers at the time. The post is partially to help avoid hindsight bias: it was not obvious that they were going wrong at the time.
SHRDLU was an early natural language understanding computer program, developed by Terry Winograd at MIT in 1968–1970. It was a program that moved objects in a simulated world and could respond to instructions on how to do so. It caused great optimism in AI research, giving the impression that a solution to natural language parsing and understanding were just around the corner. Symbolic manipulation seemed poised to finally deliver a proper AI.
Before dismissing this confidence as hopelessly naive (which it wasn’t) and completely incorrect (which it was), take a look at some of the output that SHRDLU produced, when instructed by someone to act within its simulated world:
Person: Pick up a big red block.
Computer: OK.
Person: Grasp the pyramid.
Computer: I don’t understand which pyramid you mean.
Person (changing their mind): Find a block which is taller than the one you are holding and put it into the box.
Computer: By “it”, I assume you mean the block which is taller than the one I am holding.
Computer: OK.
Person: What does the box contain?
Computer: The blue pyramid and the blue block.
Person: What is the pyramid supported by?
Computer: The box.
Person: How many blocks are not in the box?
Computer: Four of them.
Person: Is at least one of them narrower than the one which I told you to pick up?
Computer: Yes, the red cube.
It’s hard not to project an intelligence into those responses. Humans are the main producers of language, so we’re probably primed to anthropomorphise anything that speaks to us this way; add to that SHRDLU’s seeming ability to cope with some level of ambiguity (“Is at least one of them...”—one of what?), its very human turns of phrase (“By “it”, I assume you mean...”, “I don’t understand...”) and you can see why naive outsiders could believe that SHRDLU might show genuine understanding. It would be natural to give SHRDLU the benefit of the doubt, and assume it was far more complex that it really was.
We can also see why smart insiders might also believe that. Despite Dreyfuss’s critique of AI, many AI researchers hadn’t yet grasped how badly symbolic manipulation systems would fail in real-world ambiguous situations. To them, SHRDLU’s performance would seem as confirmation, not that SHRDLU was very complicated (since they knew well how complex it was), but that understanding wasn’t that complicated (since SHRDLU seemed to demonstrate that).
I would posit that this wasn’t an unreasonable belief at the time. They had a product—AI—that had demonstrated high-seeming performance in controlled tests (a proof of concept, if you will), and that they were hoping to develop for more general usage. Indeed, how do we know that SHRDLU failed to show some form of true understanding? Mainly because the approach failed in more general situations. Had symbolic AI gone on and succeeded in passing Turing tests, for example, then we probably would have concluded that “SHRDLU was an early example of AIs with understanding.”
But of course, at the time, researchers didn’t have the crucial information “your field will soon become a dead-end where genuine AI is concerned.” So their belief in SHRDLU and in the whole symbolic approach was not unreasonable—though, like everyone, they were overconfident.
I liked this:
-Terry Winograd
I have troule upvoting this because I do not see what is driving at. What is its point? You don’t say anything wrong and it is clearly applicable to LW, but nonetheless I do not get anything out of it. Maybe because I have heard/read this too often?
The point of this post was to illustrate how the GOFAI people could have got so much wrong and yet still be confident in their beliefs, by looking at what the results of one experiment—SHRDLU—must have felt like to those developers at the time. The post is partially to help avoid hindsight bias: it was not obvious that they were going wrong at the time.
As I was reading I was wondering if there was a modern application that you were hinting at—some specific case where you think we might be overconfident today. Do you see specific applications today, or is this just something you think we should keep in mind in general?
In general. When making predictions about AI, no matter how convincing they seem to us, we should remember all the wrong predictions that felt very convincing to past people for reasons that were very reasonable at the time.
I wouldn’t say that SHRDLU didn’t have true understanding. It had an accurate world-model. If that’s not what true understanding is, than I don’t know what is. The problem is that it only has a simple model and it can’t be scaled to more complex ones.
This is just definitions, though.
How do you know? Might you be getting this from Hofstadter, who might not be representative? I believe that Winograd understood the limits of the approach and considered it a dead end, perhaps even a failure. Certainly, his failure to follow up suggests pessimism.
Hey, wikipedia says that, so it must be true!
But also from wikipedia:
In 1973, Winograd moved to Stanford University and developed an AI-based framework for understanding natural language which was to give rise to a series of books. But only the first volume (Syntax) was ever published. “What I came to realize is that the success of the communication depends on the real intelligence on the part of the listener, and that there are many other ways of communicating with a computer that can be more effective, given that it doesn’t have the intelligence.”[4]
His approach shifted away from classical Artificial Intelligence after encountering the critique of cognitivism by Hubert Dreyfus and meeting with the Chilean philosopher Fernando Flores. They published a critical appraisal from a perspective based in phenomenology as Understanding Computers and Cognition: a new foundation for design in 1987. In the latter part of the 1980s, Winograd worked with Flores on an early form of groupware. Their approach was based on conversation-for-action analysis.
It that’s correct (and I see no real reason to doubt it), Winograd shifted his views a few years after SHRDLU, and after encountering dreyfus’s arguments.
SHRDLU was very impressive by any standards. It was released in the very early 1970s, when computers had only a few kilobytes of memory. Fortran was only about 15 years old. People had only just started to program. And then using paper tape.
SHRDLU took a number of preexisting ideas about language processing and planning and combines them beautifully. And SHRDLU really did understand its tiny world of logical blocks.
Given how much had been achieved in the decade prior to SHRDLU it was entirely reasonable to assume that real intelligence would be achieved in the relatively near future. Which is, of course, the point of the article.
(Winograd did cheat a bit by using Lisp. Today such a program would need to be written in C++ or possibly Java which takes much longer. Progress is not unidirectional.)
Looking at SHRDLU output just trying to recreate that looks pretty challenging for the modern coder, let alone decades ago. A little Lisp goes a long way.
This is an interesting account of how a set of experiments combined with biases to convince researchers that the experiment was much more broadly applicable than it turned out to be. I think that very often we do get overexcited by a small breakthrough that confirms our biases, and we try to talk it up as a much larger breakthrough.
SHRDLU was not “completely incorrect”.
Modern AI techniques, particularly in the field of automated planning, derive from the the fundamental research behind it and its implementation language Micro-Planner. The problem was that AI turned out to be much more hard that it seemed back in the day.
Er, yes, that’s exactly what I’m saying. It’s “Symbolic manipulation seemed poised to finally deliver a proper AI.” that is completely incorrect.
Changed “Before dismissing this as” to “Before dismissing this confidence as” for clarity.
“But of course, at the time, researchers didn’t have the crucial information “your field will soon become a dead-end where genuine AI is concerned.” ”
But you find this project to be so informative that you wrote an article to make us aware of it. So it wasn’t exactly a “dead-end”: it didn’t achieve the lofty goal of producing AI, but it did produce results that you believe are useful in AI research. Negative results are still results.
Still “a dead-end where genuine AI is concerned”. And “you will become a useful example of cognitive bias for a future generation” is not exactly what the AI makers were aiming for, methinks...
It might not be what they were aiming for, but maybe scientists should be more willing to embrace this sort of result. It’s more glamorous to find a line of research that does work, but research projects that don’t work are still useful, and should still be valued. I don’t think it’s good for science for scientists to be denigrated for choosing lines of research that end up not working.
No, they shouldn’t. But they shouldn’t be blase and relaxed about their projects working or not working either. They should never think “I will be used as an example of how not to do science” as being a positive thing to aim for...
These are completely different things. Most theories/projects/attempts in science fail. That is how science works and is NOT “an example of how not to do science”.
There doesn’t seem to be any point to this post.
The point of this post was to illustrate how the GOFAI people could have got so much wrong and yet still be confident in their beliefs, by looking at what the results of one experiment—SHRDLU—must have felt like to those developers at the time. The post is partially to help avoid hindsight bias: it was not obvious that they were going wrong at the time.