On the Shoulders of Giants
A couple hundred thousand years ago, our species reached a point where we could pass knowledge directly (through language and demonstrated action), rather than relying on adaptations to be passed on through genes. With this shift, culture was born – and it allowed us to very rapidly acquire a great deal of knowledge about our world (with the rate of progress escalating over the last several hundred years). However, due to the cultural nature of this knowledge, an individual human separated from these shared learnings still cannot get very far. Our best examples of this phenomenon come in the form of feral children (of which there are thankfully few); in these cases, growing up separated from society has resulted in a level of intelligence comparable to our Great Ape cousins (with limited ability to learn further). Though our knowledge sits outside our genes, our genetic makeup does have an influence, as we’ve evolved to be good at learning from others. Babies can recognize (and pay attention to) faces essentially from birth, and throughout childhood (and beyond) we rely heavily on observing and listening to others to learn about the world.
Our accumulated cultural knowledge has allowed us to climb to the top of Earth’s biological hierarchy, and to learn far deeper truths about our world and ourselves than any other species has uncovered. We’ve discovered atoms and quarks, gravity and electromagnetism, our place in the universe, and even some basic principles of how our brains work to do these things. Each discovery we make extends the shared knowledge base and helps propel it further forward, as we’re able to share and pass on the insights. Newton famously wrote, “If I have seen further, it is by standing upon the shoulders of giants.” For him, these giants were the scientists who came before him, but in reality we have our whole lineage of giants to thank – from the first users of tools, to the first speakers of language, to the first formers of settlements. These early discoveries laid the groundwork from which all future cultural knowledge has grown.
Our advanced ability to engage with the world is dependent on absorbing this cultural knowledge; as noted before, without it, we’re no different than other apes (for additional detail on this topic, check out The Secret of Our Success, by Joseph Henrich). However, to say we simply “absorb” the knowledge misses a key point. Although some part of our “self” is biologically rooted (i.e. present regardless of culture), much is not; the download process acts to fundamentally influence the people we become. For example, culture is responsible for strengthening our sense of self (consider the “me”-focused Western mentality vs. the “we”-focused Eastern) and laying the groundwork for morality (consider the way concepts like “respect” and “honor” fit into our web of associations).
In summary, we can see that our intelligence is dependent on our ability to download information from others, and that this process of downloading shapes the people we become. The rest of this post seeks to address what this means for artificial intelligence – to what degree will our AGI creations need to “stand on our shoulders”, and how will this shape the systems they become?
Looking at AI systems as they stand today, it may seem strange to ask this question. While you could argue that the systems “stand on our shoulders” because we built them and decided what inputs to give them, that misses the essence of the statement. In reality, AIs of today don’t “stand on our shoulders” at all, as they don’t acquire any of the meaning which we associate with concepts. When we teach Alpha Zero to play chess, we create a system which is structured in such a way that it updates toward better chess moves over time (while training), but it has no understanding of what chess is, or what it (the system) is, or why it makes a particular move. For these narrow systems, there’s no need for them to download general knowledge from us – instead, we provide the required knowledge through the way in which we structure the system. Alpha Zero doesn’t need to know anything about chess, because we do, and we structure it in such a way as to simply crunch the numbers within its narrow domain. The AI systems of today often act in surprising or unpredictable ways because they aren’t rooted in the same concepts we are – they may have “concepts” within their narrow domain (i.e. may “know” that certain types of patterns together represents a certain object), but they lack our global conceptions. It seems the AI systems of today, to the extent they share our culture (e.g. playing chess), do so only because we decide the problems they attack and the inputs they train on – beyond that, the systems aren’t downloading any of our knowledge.
The question becomes more interesting when we look toward the (hopefully) more general AI systems of the future. I’ll avoid diving too deeply into postulating the structure these systems may have, as that would take a separate post (some initial thoughts here) but for our current purposes we can consider these systems to function much like the brain. Simplifying greatly, we can picture these systems as having a general learning algorithm (like the cortex of the brain), coupled with some degree of innate knowledge / motivations (like the subcortex of the brain, including the dopamine system of reward) and some means of sensing and affecting the world. We haven’t made much progress constructing these types of systems (as they’re far more difficult to reason about than the narrow systems of today), so the brain is the best example we have to pump intuitions (though it’s important to point out that the innate knowledge / motivations which we build into these systems will likely be quite different from that instilled in us by our genes, so we must be careful not to anthropomorphize).
The important difference between these systems and the narrow systems of today is that we will not be able to “build in” any of our knowledge into these systems. As reviewed above, we can “build in” our knowledge into the narrow systems of today, for example by structuring a system in such a way as to be good at chess. With a general architecture, however, there’s no room to “build in” that knowledge! Again, think of this architecture like a human brain; assuming we could artificially construct a working brain, we’d still have no idea how to “fit” chess knowledge inside. It’s far easier to build a brain and teach it chess than to figure out which neuronal connections are needed for chess abilities from the start (a bit more on this idea here). We will run into a similar situation with general AI systems; the more generality we seek, the less of our knowledge we can “build in”.
With that being said, it does seem these types of systems will form concepts (i.e. recognize regularities of the world) in the same way we do (as that seems to be how general intelligence works), and through these concepts will be able to understand the world and work toward their goals. The key question is the degree to which these system’s concepts will overlap with our own, and what that means for their motivations and goals.
First, we must address whether artificially intelligent systems will need to rely on our knowledge, or if they can instead build up to this level of knowledge (and beyond) on their own, without guidance. It seems that a sufficiently intelligent system might be able to make deep sense of the world with no help, but it’s not clear what level of intelligence is needed for that to occur (at a minimum, would be far beyond human level). On the other hand, it seems it will be easier to build a less intelligent system which is reliant on “downloading” our knowledge (especially as we already have a working example in the human brain). Human history helps support these intuitions (though again, artificial intelligence may not be like us) as even the most intelligent humans have been limited in how far they can push the frontier of knowledge. Even the furthest outliers (e.g. Newton, Von Neumann, Einstein) “only” moved a few fields a few steps forward, highlighting the difficulty of creating new knowledge vs. learning (in some ways feels reminiscent of P vs. NP problem). It seems the best path forward for our AGI systems will be to have them, like Newton, “stand on the shoulders” of our knowledge base. This means these systems will need to share our language and concepts, at least to a significant enough degree to download our knowledge.
Assuming this is the case, what does it mean with regards to AGI behavior? It’s difficult to reason about how our culture will “play with” the system’s innate drives and motivations. We can’t rely much on humans to drive our intuitions, as human culture was created by humans and has evolved with us over our history. The AGI systems we create will have artificial motivations, detached from the evolutionary coupling between culture and drives. It may be these systems can download our knowledge in high fidelity without much change to their motivations, or it may be the bar is high enough that in downloading it, they become much like us (this extreme scenario is vastly unlikely). While we can’t say much for certain about how the influence of culture will pan out, we can make observations about what the reliance on cultural knowledge means for the AGI systems we will build. Primarily, this reliance will serve as a filter – only systems able to download our knowledge will achieve a high level of intelligence (at least initially). While these systems may not need to be quite as geared towards cultural learning as we are (for example, may not need to go as far as building in face recognition and attention, like newborns have), they will need some innate tendency for it (perhaps simply building in “attention” to humans will be enough). In humans, language can develop because our brains (innately) treat human action / input as “special”, and thus we’re able to quickly zero in on the fact that certain human noises represent certain concepts. This likely couldn’t happen if all inputs were considered of equal importance (e.g. if an infant cared as much about background noise as his mother’s voice). It seems AGI systems will be subject to similar constraints if they are to access our vast pool of existing knowledge.
Can you point to a well documented example of this?
Wikipedia has a number of examples (https://en.wikipedia.org/wiki/Feral_child), though to be transparent I haven’t done much research on the level of documentation of each.
Given that we have with GPT-3 an example of an AI that’s much broader then Chess and Go AI’s it seems to me like it’s a better model then either of those narrow domains.
Reading this post feels like following the mental process of a thoughtful and knowledgeable person. But I have trouble extracting any main point or insight from your post. The style is partly to blame, but you also don’t give references for the AGI part of the post, which makes it really hard to situate your ideas with the rest of the field.
Also, you give almost no arguments for the few points that appear along the text, like the need for AGI to be able to download our knowledge. In some sense, it’s the whole point of Machine Learning, so it’s trivially true. But it looks like you’re attempting to say something deeper here. And I’m not sure I or others here can get it without a more careful exploration.
Thanks, I appreciate that feedback. I agree with you that there was no clear main point, and I think the AGI part assumes a different type of architecture than what is used today, which as you point out makes it difficult to situate the ideas. I’ll put some thought in and see if I can come up with a better way to frame it.