Let me refer you to Computation and Human Experience, by Philip E. Agre, and to Understanding Computers and Cognition, by Terry Winograd and Fernando Flores.
telms
Guesstimates based on quick reading without serious analysis:
(1) Probability that Amnda Knox is guilty: 5%
(2) Probablility that Raffaele Sollecito is guilty: 10%
(3) Probability that Gudy Guede is guilty: 60%
(4) Probability that my estimates are congruent with OP’s: 50% (ie random, I can’t tell what his opinion is)
Hi, Antiochus. What areas of history are you interested in? I’m similarly interested in history—particularly paleontology and archaeology, the history or urban civilizations (rise and collapse and reemergence), and the history of technology. I kind of lose interest after World War II, though. You?
I was able to follow this explanation (as well as the rest of your post) without seeing your physical body in any way. … The fact that we can do this looks to me like evidence against your main thesis.
Ah, but you’re assuming that this particular interaction stands on its own. I’ll bet you were able to visualize the described gestures just fine by invoking memories of past interactions with bodies in the world.
Two points. First, I don’t contest the existence of verbal labels that merely refer—or even just register as being invoked without refering at all. As long as some labels are directly grounded to body/world, or refer to other labels that do get grounded in the body/world historically, we generally get by in routine situations. And all cultures have error detection and repair norms for conversation so that we can usually recover without social disaster.
However, the fact that verbal labels can be used without grounding them in the body/world is a problem. It is frequently the case that speakers and hearers alike don’t bother to connect words to reality, and this is a major source of misunderstanding, error, and nonsense. In our own case here and now, we are actually failing to understand each other fully because I can’t show you actual videotapes of what I’m talking about. You are rightly skeptical because words alone aren’t good enough evidence. And that is itself evidence.
Second, humans have a developmental trajectory and history, and memories of that history. We’re a time-binding animal in Korzybski’s terminology. I would suggest that an enculturated adult native speaker of a language will have what amount to “muscle memory” tics that can be invoked as needed to create referents. Mere memory of a motion or a perception is probably sufficient.
“Oh, look, it’s an invisible gesture!” is not at all convincing, I realize, so let me summarize several lines of evidence for it.
Developmentally, there’s quite a lot of research on language acquisition in infants and young children that suggests shared attention management—through indexical pointing, and shared gaze, and physical coercion of the body, and noises that trigger attention shift—is a critical building block for constructing “aboutness” in human language. We also start out with some shared, built-in cries and facial expressions linked to emotional states. At this level of development, communication largely fails unless there is a lot of embodied scaffolding for the interaction, much of it provided by the caregiver but a large part of it provided by the physical context of the interaction. There is also some evidence from the gestural communication of apes that attests to the importance of embodied attention management in communication.
Also, co-speech gesture turns out to be a human universal. Congenitally blind children do it, having never seen gesture by anyone else. Congenitally deaf children who spend time in groups together will invent entire gestural languages complete with formal syntax, as recently happened in Nicaragua. And adults speaking on the telephone will gesture even knowing they cannot be seen. Granted, people gesture in private at a significantly lower rate than they do face-to-face, but the fact that they do it at all is a bit of a puzzle, since the gestures can’t be serving a communicative function in these contexts. Does the gesturing help the speakers actually think, or at least make meaning more clear to themselves? Susan Goldin-Meadow and her colleagues think so.
We also know from video conversation data that adults spontaneously invent new gestures all the time in conversation, then reuse them. Interestingly, though, each reuse becomes more attentuated, simplified, and stylized with repetition. Similar effects are seen in the development of sign languages and in written scripts.
But just how embodied can a label be when gesture (and other embodied experience) is just a memory, and is so internalized that is is externally invisible? This has actually been tested experimentally. The Stroop effect has been known for decades, for example: when the word “red” is presented in blue text, it is read or acted on more slowly than when the word “red” is presented in red text—or in socially neutral black text. That’s on the embodied perception side of things. But more recent psychophysical experiments have demonstrated a similar psychomotor Stroop-like effect when spatial and motion stimulus sentences are semantically congruent with the direction of the required response action. This effect holds even for metaphorical words like “give”, which tests as motor-congruent with motion away from oneself, and “take”, which tests as motor-congruent with motion toward oneself.
I understand how counterintuitive this stuff can be when you first encounter it—especially to intelligent folks who work with codes or words or models a great deal. I expect the two of us will never reach a consensus on this without looking at a lot of original data—and who has the time to analyze all the data that exists on all the interesting problems in the world? I’d be pleased if you could just note for future reference that a body of empirical evidence exists for the claim. That’s all.
Are you really claiming that ability to understand the very concept of indexicality, and concepts like “soon”, “late”, “far”, etc., relies on humanlike fingers? That seems like an extraordinary claim, to put it lightly.
Yeah, I am advancing the hypothesis that, in humans, the comprehension of indexicality relies on embodied pointing at its core—though not just with fingers, which are not universally used for pointing in all human cultures. Sotaro Kita has the most data on this subject for language, but the embodied basis of mathematics is discussed in Where Mathematics Comes From, by by Geroge Lakoff and Rafael Nunez . Whether all possible minds must rely on such a mechanism, I couldn’t possibly guess. But I am persuaded humans do (a lot of) it with their bodies.
What does “there, relative to the river” mean?
In most European cultures, we use speaker-relative deictics. If I point to the southeast while facing south and say “there”, I mean “generally to my front and left”. But if I turn around and face north, I will point to the northwest and say “there” to mean the same thing, ie, “generally to my front and left.” The fact that the physical direction of my pointing gesture is different is irrelevant in English; it’s my body position that’s used as a landmark for finding the target of “there”. (Unless I’m pointing at something in particular here and now, of course; in which case the target of the pointing action becomes its own landmark.)
In a number of Native American languages, the pointing is always to a cardinal direction. If the orientation of my body changes when I say “there”, I might point over my shoulder rather than to my front and left. The landmark for finding the target of “there” is a direction relative to the trajetory of the sun.
But many cultures use a dominant feature of the landscape, like the Amazon or the Missippi or the Nile rivers, or a major mountain range like the Rockies, or a sacred city like Mecca, as the orientation landmark, and in some cultures this gets encoded in the deictics of the language and the conventions for pointing. “Up” might not mean up vertically, but rather “upriver”, while “down” would be “downriver”. In a steep river valley in New Guinea, “down” could mean “toward the river” and “up” could mean “away from the river”. And “here” could mean “at the river” while “there” could mean “not at the river”.
The cultural variability and place-specificity of language was not widely known to Western linguists until about ten years ago. For a long time, it was assumed that person-relative orientation was a biological constraint on meaning. This turns out to be not quite accurate. So I guess I should be more nuanced in the way I present the notion of embodied cognition. How’s this: “Embodied action in the world with a cultural twist on top” is the grounding point at the bottom of the symbol expansion for human meanings, linguistic and otherwise.
You make a very important point that I would like to emphasize: incommensurate bodies very likely will lead to misunderstanding. It’s not just a matter of shared or disjunct body isomorphism. It’s also a matter of embodied interaction in a real world.
Let’s take the very fundamental function of pointing. Every human language is rife with words called deictics that anchor the flow of utterance to specific pieces of the immediate environment. English examples are words like “this”, “that”, “near”, “far”, “soon”, “late”, the positional prepositions, pronominals like “me” and “you”—the meaning of these terms is grounded dynamically by the speakers and hearers in the time and place of utterance, the placement and salience of surrounding objects and structures, and the particular speaker and hearers and overhearers of the utterance. Human pointing—with the fingers, hands, eyes, chin, head tilt, elbow, whatever—has been shown to perform much the same functions as deictic speech in utterance. (See the work of Sotaro Kita if you’re interested in the data). A robot with no mechanism for pointing and no sensory apparatus for detecting the pointing gestures of human agents in its environment will misunderstand a great deal and will not be able to communicate fluently.
Then there are the cultural conventions that regulate pointing words and gestures alike. For example, spatial meanings tend to be either speaker-relative or landmark-relative or absolute (that is, embedded in a spatial frame of cardinal directions) in a given culture, and whichever of these options the culture chooses is used in both physical pointing and linguistic pointing through deictics. A robot with no cultural reference won’t be able to disambigurate “there” (relative to me here now) versus “there” (relative to the river/mountain/rising sun), even if physical pointing is integrated into the attempt to figure out what “there” is. And the problem may not be detected due to the illustion of double transparency.
This gets even more complicated when the world of discourse shifts from the immediate environment to other places, other times, or abstract ideas. People don’t stop inhabiting the real world when they talk about abstract ideas. And what you see in conversation videos is people mapping the world of discourse metaphorically to physical locations or objects in their immediate environment. The space behind me becomes yesterday’s events and the space beyond my reach in front of me becomes tomorrow’s plan. Or I alway point to the left when I’m talking about George and to the right when I’m talking about Fred.
This is all very much an empirical question, as you say. I guess my point is that the data has been accumulating for several decades now that embodiment matters a great deal. Where and how it matters is just beginning to be sorted out.
You make some good points. Please forgive me if I am more pessimistic than you are about the likelihood of AGI in our lifetimes, though. These are hard problems, which decompose into hard problems, which decompose into hard problems—it’s hard problems all the way down, I think. The good news is, there’s plenty of work to be done.
Thanks for that explanation. The complexity factor hadn’t occurred to me.
Some representative papers of Stevan Harnad are:
A list of references can be found in an earlier post in this thread.
Is a computer executing a software emulation of a humanoid body interacting with an emulated physical environment a disembodied algorithmic system, or an AI ROBOT (or neither, or both, or it depends on something)?
An emulated body in an emulated environment is a disembodied algorithmic system in my terminology. The classic example is Terry Winograd’s SHRDLU, which made significant advances in machine language understanding by adding an emulated body (arm) and an emulated world (a cartoon blocks world, but nevertheless a world that could be manipulated) to text-oriented language processing algorithms. However, Winograd himself concluded that language understanding algorithms plus emulated bodies plus emulated worlds aren’t sufficient to achieve natural language understanding.
Every emulation necessarily makes simplifying assumptions about both the world and the body that are subject to errors, bugs, and munchkin effects. A physical robot body, on the other hand, is constrained by real-world physics to that which can be built. And the interaction of a physical body with a physical environment necessarily complies with that which can actually happen in the real world. You don’t have to know everything about the world in advance, as you would for a realistic world emulation. With a robot body in a physical environment, the world acts as its own model and constrains the universe of computation to a tractable size.
The other thing you get from a physical robot body is the implicit analog computation tools that come with it. A robot arm can be used as a ruler, for example. The torque on a motor can be used as a analog for effort. On these analog systems, world-grounded metaphors can be created using symbolic labels that point to (among other things) the arm-ruler or torque-effort systems. These metaphors can serve as the terminal point of a recursive meaning builder—and the physics of the world ensures that the results are good enough models of reality for communication to succeed or for thinking to be assessed for truth-with-a-small-t.
Jurgen Streeck’s book Gesturecraft: The manu-facture of meaning is a good summary of Streeck’s cross-linguistic research on the interaction of gesture and speech in meaning creation. The book is pre-theoretical, for the most part, but Streeck does make an important claim that the biological covariation in a speaker or hearer across the somatosensory modes of gesture, vision, audition, and speech do the work of abstraction—which is an unsolved problem in my book.
Streeck’s claim happens to converge with Eric Kandel’s hypothesis that abstraction happens when neurological activity covaries across different somatosensory modes. After all, the only things that CAN covary across, say, musical tone changes in the ear and dance moves in the arms, legs, trunk, and head, are abstract relations. Temporal synchronicity and sequence, say.
Another interesting book is Cognition in the Wild by Edwin Hutchins. Hutchins goes rather too far in the direction of externalizing cognition from the participants in the act of knowing, but he does make it clear that cultures build tools into the environment that offload thinking function and effort, to the general benefit of all concerned. Those tools get included by their users in the manufacture of online meaning, to the point that the online meaning can’t be reconstructed from the words alone.
The whole field of conversation analysis goes into the micro-organization of interactive utterances from a linguistic point of view rather than a cognitive perspective. The focus is on the social and communicative functions of empirically attested language structures as demonstrated by the speakers themselves to one another. Anything written by John Heritage in that vein is worth reading, IMO.
EDIT: Revised, consolidated, and expanded bibliography on interactive construction of meaning:
LINGUISTICS
Philosophy in the Flesh, by George Lakoff and Mark Johnson
Women, Fire and Dangerous Things, by George Lakoff
The Singing Neaderthals, by Steven Mithen
CONVERSATION ANALYSIS & GESTURE RESEARCH
Handbook of Conversation Analysis, by Jack Sidnell & Tanya Stivers
Gesturecraft: The Manu-facture of Meaning, by Jurgen Streeck
Pointing: Where Language, Culture, and Cognition Meet, by Sotaro Kita
Gesture: Visible Action as Utterance, by Adam Kendon
Hearing Gesture: How Our Hands Help Us Think, by Susan Goldin-Meadow
Hand and Mind: What Gestures Reveal about Thought, by David McNeill
COGNITIVE PSYCHOLOGY
Symbols and Embodiment, edited by Manuel de Vega, Arthur M Glenberg, & Arthur C Graesser
Cognition in the Wild, Edwin Hutchins
- 5 Aug 2013 4:47 UTC; 1 point) 's comment on Welcome to Less Wrong! (6th thread, July 2013) by (
Hi, everyone. My name is Teresa, and I came to Less Wrong by way of HPMOR.
I read the first dozen chapters of HPMOR without having read or seen the Harry Potter canon, but once I was hooked on the former, it became necessary to see all the movies and then read all the books in order to get the HPMOR jokes. JK Rowling actually earned royalties she would never have received otherwise thanks to HPMOR.
I don’t actually identify as a pure rationalist, although I started out that way many, many years ago. What I am committed to today is SANITY. I learned the hard way that, in my case at least, it is the body that keeps the mind sane. Without embodiment to ground meaning, you get into problems of unsearchable infinite regress, and you can easily hypothesize internally consistent worlds that are nevertheless not the real world the body lives in. This can lead to religions and other serious delusions.
That said, however, I find a lot of utility in thinking through the material on this site. I discovered Bayesian decision theory in high school, but the texts I read at the time either didn’t explain the whole theory or else I didn’t catch it all at age 14. Either way, it was just a cute trick for calculating compound utility scores based on guesses of likelihood for various contingencies. The greatest service the Less Wrong site has done for me is to connect the utility calculation method to EMPIRICAL prior probabilities! Like, duh! A hugely useful tool, that is.
As a professional writer in my day job and student of applied linguistics research otherwise, I have some reservations about those of the Sequences that reference the philosophy of language. I completely agree that Searle believes in magic (aka “intentionality”), which is not useful. But this does not mean the Chinese Room problem isn’t real.
When you study human language use empirically in natural contexts (through frame-by-frame analysis of video recordings), it turns out that what we think we do with language and what we actually do are rather divergent. The body and places in the world and other agents in the interaction all play a much bigger role in the real-time construction of meaning than you would expect from introspection. Egocentric bias has a HUGE impact on what we imagine about our own utterances. I’ve come to the conclusion that Stevan Harnad is absolutely correct, and that machine language understanding will require an AI ROBOT, not a disembodied algorithmic system.
As for HPMOR, I hereby predict that Harrymort is going to go back in time to the primal event in Godric’s Hollow and change the entire universe to canon in his quest to, er, spoilers, can’t say.
Cheers.
- 8 Aug 2013 11:12 UTC; 3 points) 's comment on More “Stupid” Questions by (
I’d suggest adding separate columns for actual WORK TIME versus total ELAPSED TIME after email turnaround, task switching, sleep, etc.
Prepare kettle of chili from scratch: 40 min work time, 3 hr elapsed time
Read a 350-page novel: 6 hr (work & elapsed)
Read 690 pages of economic history excluding references: 52 hrs (work time), 3 months (elapsed)
Let’s see if I can take your college example and fit it to what Freakonics is investigating.
Before you roll the dice, you are asked how confident you are that if the dice roll 6, you will in fact enroll and pay the first semester’s tuition at school X and still be attending classes there two months from now. You can choose from:
(a) Very likely
(b) Somewhat likely
(c) Somewhat unlikely
(d) Very unlikely
Then you’re asked to give a probability estimate that you will not show up, pay up, and stick it out for two months.
Let’s say you’re highly motivated to do school and all three school choices are equally wonderful to you. But you don’t have the tuition money and all three schools have turned you down for a scholarship. You are determined to work your way through school, but you know that the odds are against you being able to work full time and go to school full time at the same time.
So you estimate the odds against paying the first chunk of tuition and carrying a full load of classes and performing well enough to keep your job at 75%. You know it’s going to be pretty damned hard.
All the same, you are confident that you are more likely than not to succeed anyway. You pick “somewhat likely” as your confidence of success.
These two estimates are logically incongruent. What interested me about the Freakonomics study is that the software challenged me on the mismatch. It popped up a dialog that said, in effect, you said you’re more likely to succeed than not, but you estimated a 75% chance of failure. It sounds like you’ve already decided to quit. Are you sure you want to roll the dice?
You can end it there or go on with the roll.
Now doesn’t that challenge change the whole feel of the decision for you? It sure does for me.
It’s my understanding that, in a repeated series of PD games, the best strategy in the long run is “tit-for-tat”: cooperate by default, but retaliate with defection whenever someone defects against you, and keep defecting until the original defector returns to cooperation mode. Perhaps the prisoners in this case were generalizing a cooperative default from multiple game-like encounters and treating this particular experiment as just one more of these more general interactions?
Mmm, that’s not really where I’m coming from. There is an aggressively empirical research tradition in applied linguistics called “conversation analysis”, which analyzes how language is actually used in real-world interaction. The raw data is actual recordings, usually with video so that the physical embodiment of the interaction and the gestures and facial expressions can be captured. The data is transcribed frame-by-frame at 1/30th of a second intervals, and includes gesture as well as vocal non-words (uh-huh, um, laugh, quavery voice, etc) to get a more complete picture of the actual construction of meaning in real time. So my question was actually an empirical one. It’s one thing to guess at an analytical level that “God” might be a stop-signal in religious debates or in question chains involving children. But is the term really used that way? Has anyone got any unedited video recordings of such conversations that we could analyze? After making very many errors of my own based on expectation rather than actual data, I tend to be skeptical of any statement that says “language IS used in manner X”, when that manner is not demonstrated in data. Language CAN be used in manner X, yes, but is that the normative use in actual practice? We don’t know until we do the hard empirical work needed to find out.
Speaking for a moment as a discourse analyst rather than a philosopher, I would like to point out that much talk is social action rather than reasoning or argument, and what is said is rarely all, or even most, of what is meant. Does anyone here know of any empirical discourse research into the actual linguistic uses of semantic “stopsigns” in conversational practice?
Sad to say, my only experience with wargaming was playing Risk in high school. I’m not sure that counts.