Cog sci question about how words are organized in our minds.
So, I’m a native English speaker, and for the last ~1.5 years, I’ve been studying Finnish as a second language. I was making very slow progress on vocabulary, though, so a couple days ago I downloaded Anki and moved all my vocab lists over to there. These vocab lists basically just contained random words I had encountered on the internet and felt like writing down; a lot of them were for abstract concepts and random things that probably won’t come up in conversation, like “archipelago” (the Finnish word is “saaristo”, if anyone cares). Anyway, the point is that I am not trying to learn the vocabulary in any sensible order, I’m just shoving random words into my brain.
While studying today, I noticed that I was having a lot more trouble with certain words than with others, and I started to wonder why, and what implications this has for how words are organized in our minds, and whether anyone has done studies on this.
For instance, there seemed to be a lot of “hash collisions”: vocabulary words that I kept confusing with one another. Some of these were clearly phonetic: hai (shark) and kai (probably). Another phonetic pair: toivottaa (to wish) and taivuttaa (to inflect a word). Some were a combination of phonetic and semantic: virhe (error), vihje (hint), vaihe (phase, stage), and vika (fault). Some of them I have no idea why I kept confusing: kertautua (to recur) and kuvastaa (to mirror, to reflect).
There were also a few words that I just had inordinate amounts of trouble remembering, and I don’t know why: eksyä (to get lost), ehtiä (to arrive in time), löytää (to find), kyllästys (saturation), sisältää (to include), arvata (to guess). Aside from the last one, all of these have the letter ä in them, so maybe that has something to do with it. Also, the first two words don’t have a single English verb as an equivalent.
There were also some words that were easier than I expected: vankkuri (wagon), saaristo (archipelago), and some more that I don’t remember now because they quickly vanished from my deck. Both of these words are unusual but concrete concepts.
Do different people struggle with the same words when learning a language? Are some Finnish words just inherently “easy” or “hard” for English speakers to learn? If it’s different for each person, how does the ease of learning certain words relate to a person’s life experiences, interests, common thoughts, etc.?
What do hash collisions tell us about how words are organized in our minds? Can they tell us anything about the features we might be using to recognize words? For instance, English speakers often seem to have trouble remembering and distinguishing Chinese names; they all seem to “sound the same”. Why does this happen? Here’s a hypothesis: when we hear a word, based on its features, it is mapped to a specific part of a learned phonetic space before being used to access semantic content. Presumably we would learn this phonetic space to maximize the distance between words in a language, since the farther apart words are, the less chance they have of accessing the wrong semantic content. Maybe certain Finnish words sound the same to me because they map to nearby regions of my phonetic space, but a speaker of some other language wouldn’t confuse these particular words because they’d have a different phonetic space? I’m just speculating wildly here.
I’d be interested to hear everyone else’s vocab-learning experiences and crazy hypotheses for what’s going on. Also, does anyone know any actual research that’s been done on this stuff?
These are interesting questions. I think the keyword you want for “hash collisions” is interference. Here’s a more helpful overview from an education perspective: Learning Vocabulary in Lexical Sets: Dangers and Guidelines (2000). It mostly talks about semantic interference, but it mentions some other work on similar-sounding and similar-looking words.
For instance, there seemed to be a lot of “hash collisions”: vocabulary words that I kept confusing with one another. Some of these were clearly phonetic: hai (shark) and kai (probably).
That problem is called memory interference. I think reading Wozniaks 20 rules, gives you a good elementary understanding of concepts like that.
In general there doesn’t seem to be a good way to predict memory interference in advance.
When faced with apparent interference I usually make a card specifically for the interference:
You may find “linguistic cohort”a useful search phrase.
When I studied linguistics back in the 80s it was a popular way of thinking about lexical retrieval. E.g., a cohort model might explain collisions between “kertautua” and “kuvastaa” by observing that they share an initial-sound, final-sound, and (I think?) number of syllables, all of which are lexical search keys. (Put another way: it’s easy to list words that start with “k”, words that end with “a”, and three-syllable words.)
That said, I remember thinknig at the time that it was kind of vacuous. (After all, it’s also easy to list words with “v” in the middle somewhere.)
I find that making up mnemonics works well to combat interference. They don’t have to be good mnemonics for this to work.
Example: I noticed I kept mixing up the Spanish words aquí (here) and allí (there). I then made up the mnemonic that aquí has a “k” sound so it’s close, and allí contains l’s so it’s long away. A few days later, I encounter the word “allí”. My thinking then goes “That’s either here or there, I keep confusing those” → “oh yeah, I made up a mnemonic” → “allí means there”.
I wonder how well this method would work for others.
This is generally how I memorize the bits of scripts that are from my perspective arbitrary. It doesn’t even need much of a connection to the text itself.
E.g., one line I had trouble with was “Come, sirs”, which I kept paraphrasing as any of a dozen phrases that basically mean the same thing, until I associated it with brothels for knights. Now my cue comes along, I know I’m leading a group of people elsewhere, a bunch of competing ways to say that get activated, the brothels for knights concept gets activated along with them, it reinforces “come sirs” and that’s what I say.
I tend to think of this in terms of compression: you can use various compression schemes to store english words in fewer bits, but that will make you store foreign words in more bits. For example, you could order letters by frequency and represent frequent letters with fewer bits. You can do the same with groups of letters (e.g. “thing” = “th” + “ing”, both very frequent combinations in English), or take advantage of conditional probabilities (‘t’ much more likely to be followed by ‘h’ than ‘n’) to squeeze a few more bits of compression. Similarly, if a westerner wanted to describe the Chinese character 語 without any prior knowledge of Chinese, the description would be very long, but a Chinese speaker would describe it as “the key for speech, and a five above a mouth”.
This is just another way of describing what you call phonetic space.
Simple issues of frequency makes learners see words as “closer” than native speakers do, another problem is when the “phonetic space” of one language has more(or different) dimensions than those of another; e.g. many people find it hard to learn words when the distinction between voiced and unvoiced “th” is important, or when the tone of a syllable also carries meaning (as in Chinese). The Chinese words for “mother”, “insult” and “horse” all sound like exactly the same word, “ma”, to non-Chinese speakers.
Cog sci question about how words are organized in our minds.
So, I’m a native English speaker, and for the last ~1.5 years, I’ve been studying Finnish as a second language. I was making very slow progress on vocabulary, though, so a couple days ago I downloaded Anki and moved all my vocab lists over to there. These vocab lists basically just contained random words I had encountered on the internet and felt like writing down; a lot of them were for abstract concepts and random things that probably won’t come up in conversation, like “archipelago” (the Finnish word is “saaristo”, if anyone cares). Anyway, the point is that I am not trying to learn the vocabulary in any sensible order, I’m just shoving random words into my brain.
While studying today, I noticed that I was having a lot more trouble with certain words than with others, and I started to wonder why, and what implications this has for how words are organized in our minds, and whether anyone has done studies on this.
For instance, there seemed to be a lot of “hash collisions”: vocabulary words that I kept confusing with one another. Some of these were clearly phonetic: hai (shark) and kai (probably). Another phonetic pair: toivottaa (to wish) and taivuttaa (to inflect a word). Some were a combination of phonetic and semantic: virhe (error), vihje (hint), vaihe (phase, stage), and vika (fault). Some of them I have no idea why I kept confusing: kertautua (to recur) and kuvastaa (to mirror, to reflect).
There were also a few words that I just had inordinate amounts of trouble remembering, and I don’t know why: eksyä (to get lost), ehtiä (to arrive in time), löytää (to find), kyllästys (saturation), sisältää (to include), arvata (to guess). Aside from the last one, all of these have the letter ä in them, so maybe that has something to do with it. Also, the first two words don’t have a single English verb as an equivalent.
There were also some words that were easier than I expected: vankkuri (wagon), saaristo (archipelago), and some more that I don’t remember now because they quickly vanished from my deck. Both of these words are unusual but concrete concepts.
Do different people struggle with the same words when learning a language? Are some Finnish words just inherently “easy” or “hard” for English speakers to learn? If it’s different for each person, how does the ease of learning certain words relate to a person’s life experiences, interests, common thoughts, etc.?
What do hash collisions tell us about how words are organized in our minds? Can they tell us anything about the features we might be using to recognize words? For instance, English speakers often seem to have trouble remembering and distinguishing Chinese names; they all seem to “sound the same”. Why does this happen? Here’s a hypothesis: when we hear a word, based on its features, it is mapped to a specific part of a learned phonetic space before being used to access semantic content. Presumably we would learn this phonetic space to maximize the distance between words in a language, since the farther apart words are, the less chance they have of accessing the wrong semantic content. Maybe certain Finnish words sound the same to me because they map to nearby regions of my phonetic space, but a speaker of some other language wouldn’t confuse these particular words because they’d have a different phonetic space? I’m just speculating wildly here.
I’d be interested to hear everyone else’s vocab-learning experiences and crazy hypotheses for what’s going on. Also, does anyone know any actual research that’s been done on this stuff?
These are interesting questions. I think the keyword you want for “hash collisions” is interference. Here’s a more helpful overview from an education perspective: Learning Vocabulary in Lexical Sets: Dangers and Guidelines (2000). It mostly talks about semantic interference, but it mentions some other work on similar-sounding and similar-looking words.
Thanks!
That problem is called memory interference. I think reading Wozniaks 20 rules, gives you a good elementary understanding of concepts like that.
In general there doesn’t seem to be a good way to predict memory interference in advance.
When faced with apparent interference I usually make a card specifically for the interference:
Front: (kai / hai) → shark
Back: hai
Front: (kai / hai) → probably
Back: kai
You may find “linguistic cohort”a useful search phrase.
When I studied linguistics back in the 80s it was a popular way of thinking about lexical retrieval. E.g., a cohort model might explain collisions between “kertautua” and “kuvastaa” by observing that they share an initial-sound, final-sound, and (I think?) number of syllables, all of which are lexical search keys. (Put another way: it’s easy to list words that start with “k”, words that end with “a”, and three-syllable words.)
That said, I remember thinknig at the time that it was kind of vacuous. (After all, it’s also easy to list words with “v” in the middle somewhere.)
I find that making up mnemonics works well to combat interference. They don’t have to be good mnemonics for this to work.
Example: I noticed I kept mixing up the Spanish words aquí (here) and allí (there). I then made up the mnemonic that aquí has a “k” sound so it’s close, and allí contains l’s so it’s long away. A few days later, I encounter the word “allí”. My thinking then goes “That’s either here or there, I keep confusing those” → “oh yeah, I made up a mnemonic” → “allí means there”.
I wonder how well this method would work for others.
This is generally how I memorize the bits of scripts that are from my perspective arbitrary. It doesn’t even need much of a connection to the text itself.
E.g., one line I had trouble with was “Come, sirs”, which I kept paraphrasing as any of a dozen phrases that basically mean the same thing, until I associated it with brothels for knights. Now my cue comes along, I know I’m leading a group of people elsewhere, a bunch of competing ways to say that get activated, the brothels for knights concept gets activated along with them, it reinforces “come sirs” and that’s what I say.
I tend to think of this in terms of compression: you can use various compression schemes to store english words in fewer bits, but that will make you store foreign words in more bits. For example, you could order letters by frequency and represent frequent letters with fewer bits. You can do the same with groups of letters (e.g. “thing” = “th” + “ing”, both very frequent combinations in English), or take advantage of conditional probabilities (‘t’ much more likely to be followed by ‘h’ than ‘n’) to squeeze a few more bits of compression. Similarly, if a westerner wanted to describe the Chinese character 語 without any prior knowledge of Chinese, the description would be very long, but a Chinese speaker would describe it as “the key for speech, and a five above a mouth”.
This is just another way of describing what you call phonetic space.
Simple issues of frequency makes learners see words as “closer” than native speakers do, another problem is when the “phonetic space” of one language has more(or different) dimensions than those of another; e.g. many people find it hard to learn words when the distinction between voiced and unvoiced “th” is important, or when the tone of a syllable also carries meaning (as in Chinese). The Chinese words for “mother”, “insult” and “horse” all sound like exactly the same word, “ma”, to non-Chinese speakers.