Nice post; I think I agree with most of it. Two points I want to make:
Or is this “qualitative difference” illusory, with the vast majority of human cognitive feats explainable as nothing more than a scaled-up version of the cognitive feats of lower animals?
This seems like a false dichotomy. We shouldn’t think of scaling up as “free” from a complexity perspective—usually when scaling up, you need to make quite a few changes just to keep individual components working. This happens in software all the time: in general it’s nontrivial to roll out the same service to 1000x users.
One possibility is that the first species that masters language, by virtue of being able to access intellectual superpowers inaccessible to other animals, has a high probability of becoming the dominant species extremely quickly.
I think this explanation makes sense, but it raises the further question of why we don’t see other animal species with partial language competency. There may be an anthropic explanation here—i.e. that once one species gets a small amount of language ability, they always quickly master language and become the dominant species. But this seems unlikely: e.g. most birds have such severe brain size limitations that, while they could probably have 1% of human language, I doubt they could become dominant in anywhere near the same way we did.
There’s some discussion of this point in Laland’s book Darwin’s Unfinished Symphony, which I recommend. He argues that the behaviour of deliberate teaching is uncommon amongst animals, and doesn’t seem particularly correlated with intelligence—e.g. ants sometimes do it, whereas many apes don’t. His explanation is that students from more intelligent species are easier to teach, but would also be more capable of picking up the behaviour by themselves without being taught. So there’s not a monotonically increasing payoff to teaching as student intelligence increases—but humans are the exception (via a mechanism I can’t remember; maybe due to prolonged infancy?), which is how language evolved. This solves the problem of trustworthiness in language evolution, since you could start off by only using language to teach kin.
A second argument he makes is that the returns from increasing fidelity of cultural transmission start off low, because the amount of degradation is exponential in the number of times a piece of information transmitted. Combined with the previous paragraph, this may explain why we don’t see partial language in any other species, but I’m still fairly uncertain about this.
I think this explanation makes sense, but it raises the further question of why we don’t see other animal species with partial language competency. There may be an anthropic explanation here—i.e. that once one species gets a small amount of language ability, they always quickly master language and become the dominant species. But this seems unlikely: e.g. most birds have such severe brain size limitations that, while they could probably have 1% of human language, I doubt they could become dominant in anywhere near the same way we did.
Can you elaborate more on what partial language competency would look like to you? (FWIW, my current best guess is on “once one species gets a small amount of language ability, they always quickly master language and become the dominant species”, but I have a lot of uncertainty. I suppose this also depends a lot on what exactly what’s meant by “language ability”.)
The ability to create and understand combinatorially many sentences—not necessarily with fully recursive structure, though. For example, if there’s a finite number of sentence templates, and then the animal can substitute arbitrary nouns and verbs into them (including novel ones).
The sort of things I imagine animals with partial language saying are:
There’s a lion behind that tree.
Eat the green berries, not the red berries.
I’ll mate with you if you bring me a rabbit.
“Once one species gets a small amount of language ability, they always quickly master language and become the dominant species”—this seems clearly false to me, because most species just don’t have the potential to quickly become dominant. E.g. birds, small mammals, reptiles, short-lived species..
AFAICT this is highly disputed. Many people think that her handlers had an agenda, and that the purported examples of her combining words were her randomly spamming sign language to get treats. Raw data was never realeased, and no one was allowed to interact with or see them interact with Koko except her handlers.
It seems plausible that the purported examples are a case of selective reporting, wishful thinking, and the Clever Hans effect.
This seems like a false dichotomy. We shouldn’t think of scaling up as “free” from a complexity perspective—usually when scaling up, you need to make quite a few changes just to keep individual components working. This happens in software all the time: in general it’s nontrivial to roll out the same service to 1000x users.
I agree. But I also think there’s an important sense in which this additional complexity is mundane—if the only sorts of differences between a mouse brain and a human brain were the sorts of differences involved in scaling up a software service to 1000x users, I think it would be fair (although somewhat glib) to call a human brain a scaled-up mouse brain. I don’t think this comparison would be fair if the sorts of differences were more like the sorts of differences involved in creating 1000 new software services.
I think whether the additional complexity is mundane or not depends on how you’re producing the agent. Humans can scale up human-designed engineering products fairly easily, because we have a high-level understanding of how the components all fit together. But if you have a big neural net whose internal composition is mostly determined by the optimiser, then it’s much less clear to me. There are some scaling operations which are conceptually very easy for humans, and also hard to do via gradient descent. As a simple example, in a big neural network where the left half is doing subcomputation X and the right half is doing subcomputation Y, it’d be very laborious for the optimiser to swap it so the left half is doing Y and the right half is doing X—since the optimiser can only change the network gradually, and after each gradient update the whole thing needs to still work. This may be true even if swapping X and Y is a crucial step towards scaling up the whole system, which will later allow much better performance.
In other words, we’re biased towards thinking that scaling is “mundane” because human-designed systems scale easily (and to some extent, because evolution-designed systems also scale easily). It’s not clear that AIs also have this property; there’s a whole lot of retraining involved in going from a small network to a bigger network (and in fact usually the bigger network is trained from scratch rather than starting from a scaled-up version of the small one).
Nice post; I think I agree with most of it. Two points I want to make:
This seems like a false dichotomy. We shouldn’t think of scaling up as “free” from a complexity perspective—usually when scaling up, you need to make quite a few changes just to keep individual components working. This happens in software all the time: in general it’s nontrivial to roll out the same service to 1000x users.
I think this explanation makes sense, but it raises the further question of why we don’t see other animal species with partial language competency. There may be an anthropic explanation here—i.e. that once one species gets a small amount of language ability, they always quickly master language and become the dominant species. But this seems unlikely: e.g. most birds have such severe brain size limitations that, while they could probably have 1% of human language, I doubt they could become dominant in anywhere near the same way we did.
There’s some discussion of this point in Laland’s book Darwin’s Unfinished Symphony, which I recommend. He argues that the behaviour of deliberate teaching is uncommon amongst animals, and doesn’t seem particularly correlated with intelligence—e.g. ants sometimes do it, whereas many apes don’t. His explanation is that students from more intelligent species are easier to teach, but would also be more capable of picking up the behaviour by themselves without being taught. So there’s not a monotonically increasing payoff to teaching as student intelligence increases—but humans are the exception (via a mechanism I can’t remember; maybe due to prolonged infancy?), which is how language evolved. This solves the problem of trustworthiness in language evolution, since you could start off by only using language to teach kin.
A second argument he makes is that the returns from increasing fidelity of cultural transmission start off low, because the amount of degradation is exponential in the number of times a piece of information transmitted. Combined with the previous paragraph, this may explain why we don’t see partial language in any other species, but I’m still fairly uncertain about this.
Can you elaborate more on what partial language competency would look like to you? (FWIW, my current best guess is on “once one species gets a small amount of language ability, they always quickly master language and become the dominant species”, but I have a lot of uncertainty. I suppose this also depends a lot on what exactly what’s meant by “language ability”.)
A couple of intuitions:
Koko the gorilla had partial language competency.
The ability to create and understand combinatorially many sentences—not necessarily with fully recursive structure, though. For example, if there’s a finite number of sentence templates, and then the animal can substitute arbitrary nouns and verbs into them (including novel ones).
The sort of things I imagine animals with partial language saying are:
There’s a lion behind that tree.
Eat the green berries, not the red berries.
I’ll mate with you if you bring me a rabbit.
“Once one species gets a small amount of language ability, they always quickly master language and become the dominant species”—this seems clearly false to me, because most species just don’t have the potential to quickly become dominant. E.g. birds, small mammals, reptiles, short-lived species..
AFAICT this is highly disputed. Many people think that her handlers had an agenda, and that the purported examples of her combining words were her randomly spamming sign language to get treats. Raw data was never realeased, and no one was allowed to interact with or see them interact with Koko except her handlers.
It seems plausible that the purported examples are a case of selective reporting, wishful thinking, and the Clever Hans effect.
There are other apes, including Washoe and Kanzi, who have been observed to use language.
Admittedly, they weren’t very good at it by human standards.
I agree. But I also think there’s an important sense in which this additional complexity is mundane—if the only sorts of differences between a mouse brain and a human brain were the sorts of differences involved in scaling up a software service to 1000x users, I think it would be fair (although somewhat glib) to call a human brain a scaled-up mouse brain. I don’t think this comparison would be fair if the sorts of differences were more like the sorts of differences involved in creating 1000 new software services.
I think whether the additional complexity is mundane or not depends on how you’re producing the agent. Humans can scale up human-designed engineering products fairly easily, because we have a high-level understanding of how the components all fit together. But if you have a big neural net whose internal composition is mostly determined by the optimiser, then it’s much less clear to me. There are some scaling operations which are conceptually very easy for humans, and also hard to do via gradient descent. As a simple example, in a big neural network where the left half is doing subcomputation X and the right half is doing subcomputation Y, it’d be very laborious for the optimiser to swap it so the left half is doing Y and the right half is doing X—since the optimiser can only change the network gradually, and after each gradient update the whole thing needs to still work. This may be true even if swapping X and Y is a crucial step towards scaling up the whole system, which will later allow much better performance.
In other words, we’re biased towards thinking that scaling is “mundane” because human-designed systems scale easily (and to some extent, because evolution-designed systems also scale easily). It’s not clear that AIs also have this property; there’s a whole lot of retraining involved in going from a small network to a bigger network (and in fact usually the bigger network is trained from scratch rather than starting from a scaled-up version of the small one).