From Simon’s ant to machine learning, a parable
Simon’s ant is a well-known thought experiment from Chapter 3, “The Psychology of Thinking: Embedding Artifice in Nature,” in Herbert A. Simon (1981), The Sciences of the Artificial. It’s a parable about computation, about how computational requirements depend on the problem to be solved. Stated that way, it is an obvious truism. But Simon’s thought experiment invites you to consider this truism where the “problem to be solved” is an environment external to the computer – it is thus reminiscent of Braitenberg’s primitive vehicles (which I discussed in this post).
Think of it like this: the nervous system requires environmental support if it is to maintain its physical stability and operational coherence. Note that Simon was not at all interested in the physical requirements of the nervous system. Rather, he was interested in suggesting that we can get complex behavior from relatively simple devices, and simplicity translates into design requirements for a nervous system.
Simon asks us to imagine an ant moving about on a beach:
We watch an ant make his laborious way across a wind- and wave-molded beach. He moves ahead, angles to the right to ease his climb up a steep dunelet, detours around a pebble, stops for a moment to exchange information with a compatriot. Thus he makes his weaving, halting way back to his home. So as not to anthropomorphize about his purposes, I sketch the path on a piece of paper. It is a sequence of irregular, angular segments—not quite a random walk, for it has an underlying sense of direction, of aiming toward a goal.
After introducing a friend, to whom he shows the sketch and to whom he addresses a series of unanswered questions about the sketched path, Simon goes on to observe:
Viewed as a geometric figure, the ant’s path is irregular, complex, hard to describe. But its complexity is really a complexity in the surface of the beach, not a complexity in the ant. On that same beach another small creature with a home at the same place as the ant might well follow a very similar path.
What does this tell us about machine learning?
Think of the ant as the machine, that is, the computer system, and the environment as the HUMONGOUS pile of data it is walking through. What the machine does is quite simple. In the case of large language models, which is what I’ve thought the most about, the machine makes a guess about the next word and adjusts parameter weights accordingly. That’s all it does, guess-adjust, guess-adjust, guess-adjust...for all the texts in the corpus.
Taken collectively those texts constitute an enormously complex environment. Correlatively, the model our machine builds with its simple procedure is enormously complex as well, though not quite so complex as that environment.
Now, and here’s where things get tricky, think of ourselves as the ant, and the world as the complex environment we’re walking about it. I know, we’re not so simple as the ant, but in comparison to the vast complexity of the world, yes, we’re simple. In the process of walking through this vast world we write things. Those things, in turn, constitute the environment of some machine learning system. So, we “consume” the world and emit writings. The computer “consumes” our writings and emits a large language model.
Think about that very carefully. The model we build of the world is, in fact, complex. But not so complex as the world itself. The model the machine builds of its textual world is complex as well. But is it as complex as those texts? Is it as complex as human understanding of those texts? Does that suggest why some thinkers insist that AIs have direct access to the world, sensing it and moving about in it, if they are to be truly “intelligent”?
* * * * *
This post from 2020 covers similar territory, albeit in different and more abstract terms, World, mind, and learnability: A note on the metaphysical structure of the cosmos, August 15, 2020, https://new-savanna.blogspot.com/2020/08/world-mind-and-learnability-note-on.html.
I have included that in my working paper, GPT-3: Waterloo or Rubicon? Here be Dragons, Version 4.1, May 7, 2022, pp. 23-26, https://www.academia.edu/43787279/GPT_3_Waterloo_or_Rubicon_Here_be_Dragons_Version_4_1.
This post is based on material from an older one, Computational Thinking and the Digital Critic: Part 2, An Ant Walks on the Beach and a Pilot is Alone, July 25, 2017, https://new-savanna.blogspot.com/2014/04/computational-thinking-and-digital_30.html.
Fulltext of chapter 3.
I think humanity’s text corpus is sufficiently rich and comprehensive for language models to generalise far into the superhuman domain.
Yes, I like the point of view shared in this post, but I agree with you. As datasets grow sufficiently large, diverse and complex, they do an increasingly better job at revealing the underlying reality. Like taking a rubbing using a very fine pencil of a very complex object. Your first ten strokes miss so much that they are nearly noise. The first hundred also miss a lot and are hard to interpret. But a million? The information of each individual stroke may be low, but the underlying features remain a consistent force. I think anyone working with toy datasets even moderate sized datasets of a few million examples ought to keep this issue very much in mind. There could be an argument made that the information coming through language is systematically missing certain key aspects of the underlying reality. My intuition strongly suggests that the set of language corpus + video corpus contains all the necessary information to accurately model reality though. I’d be happy to bet on this point if anyone cares to make a prediction market about it.
“I’d be happy to bet on this point if anyone cares to make a prediction market about it.”
Hmmmm. Gary Marcus, where are you?
That is a very interesting point of view. I agree with the idea that complexity is a reflection of the environment in which the machine is acting on. Atomic operations of the machine may look simple as compared to the external world, however, isn’t the machine itself an outcome of the external world? Machine and the ant look simple only because we chose to focus on them! Otherwise, they are not so different from the background world in which they operate. Don’t you think so? In that way, there is hardly any difference between an ‘intelligent being’ and a machine!