Introducing AGI Safety in general, and my research in particular, to novices / skeptics, in 5 minutes, out loud
I might be interviewed on a podcast where I need to introduce AGI risk to a broad audience of people who mostly aren’t familiar with it and/or think it’s stupid. The audience is mostly neuroscientists plus some AI people. I wrote the following as a possible entry-point, if I get thrown some generic opening question like “Tell me about what you’re working on”:
The human brain does all these impressive things, such that humanity was able to transform the world, go to the moon, invent nuclear weapons, wipe out various species, etc. Human brains did all those things by running certain algorithms.
And sooner or later, people will presumably figure out how to run similar algorithms on computer chips.
Then what? That’s the million-dollar question. Then what? What happens when researchers eventually get to the point where they can run human-brain-like algorithms on computer chips?
OK, to proceed I need to split into two ways of thinking about these future AI systems: Like a tool or like a species.
Let’s start with the tool perspective. Here I’m probably addressing the AI people in the audience. You’re thinking, “Oh, you’re talking about AI, well pfft, I know what AI is, I work with AI every day, AI is kinda like language models and ConvNets and AlphaFold and so on. By the time we get future algorithms that are more like how the human brain works, they’re going to be more powerful, sure, but we should still think of them as in the same category as ConvNets, we should think of them like a tool that people will use.” OK, if that’s your perspective, then the goal is for these tools to do the things that we want them to do. And conversely, the concern is that these systems could go about doing things that the programmers didn’t want them to do, and that literally nobody wanted them to do, like try to escape human control. The technical problem here is called The Alignment Problem: If people figure out how to run human-brain-like algorithms on computer chips, and they want those algorithms to try to do X, how can they do that? It’s not straightforward. For example, humans have an innate sex drive, but it doesn’t work very reliably, some people choose to be celibate. OK, so imagine you have the source code for a human-like brain architecture and training environment, and you want it to definitely grow into an adult that really, deeply, wants to do some particular task, like let’s say design solar cells, while also being honest and staying under human control. How would you do that? What exactly would you put into the source code? Nobody knows the answer. And when you dig into it you find that it’s a surprisingly tricky technical problem, for pretty deep reasons. And that technical problem is something that I and others in the field are working on.
That was the tool perspective. But then there’s probably another part of the audience, maybe a lot of the neuroscientists, who are strenuously objecting here: if we run human-brain-like algorithms on computer chips, we shouldn’t think of that as like a tool for humans to use, instead we should think of it like a species, a new intelligent species that we have invited onto our planet, and indeed a species which will eventually think much faster than humans, and be more insightful and creative than humans, and also probably eventually outnumber humans by a huge factor, and so on. In that perspective, the question is: if we’re going to invite this powerful new intelligent species onto our planet, how do we make sure that it’s a species that we actually want to share the planet with? And how do we make sure that they want to continue sharing the planet with us? Or more generally, how do we bring about a good future? There are some interesting philosophy questions here which we can get back to, but putting those aside, there’s also a technical problem to solve, which is, whatever properties we want this new intelligent species to have, we need to actually write source code such that that actually happens. For example, if we want this new species to feel compassion and friendship, we gotta put compassion and friendship into the source code. Human sociopaths are a case study here. Sociopaths exist, therefore it is possible to make an intelligent species that isn’t motivated by compassion and friendship. Not just possible, but strictly easier! I think maybe future programmers will want to put compassion and friendship into the source code, but they won’t know how, so they won’t do it. So I say, let’s try to figure that out ahead of time. Again, I claim this is a very tricky technical problem, when you start digging into it. We can talk about why. Anyway, that technical problem is also something that I’m working on.
So in summary, sooner or later people will figure out how to run human-brain-like algorithms on computer chips, and this is a very very big deal, it could be the best or worst thing that’s ever happened to humanity, and there’s work we can do right now to increase the chance that things go well, including, in particular, technical work that involves thinking about algorithms and AI and reading neuroscience papers. And that’s what I’m working on!
I’m open to feedback; e.g., where might skeptical audience-members fall off the boat? (I am aware that it’s too long for one answer; I expect that I’ll end up saying various pieces of this in some order depending on the flow of the conversation. But still, gotta start somewhere.)
Introducing AGI Safety in general, and my research in particular, to novices / skeptics, in 5 minutes, out loud
I might be interviewed on a podcast where I need to introduce AGI risk to a broad audience of people who mostly aren’t familiar with it and/or think it’s stupid. The audience is mostly neuroscientists plus some AI people. I wrote the following as a possible entry-point, if I get thrown some generic opening question like “Tell me about what you’re working on”:
I’m open to feedback; e.g., where might skeptical audience-members fall off the boat? (I am aware that it’s too long for one answer; I expect that I’ll end up saying various pieces of this in some order depending on the flow of the conversation. But still, gotta start somewhere.)
I would prepare a shortened version − 100 words max—that you could also give.
Yeah, I think I have a stopping point after the first three paragraphs (with minor changes).
Could you just say you’re working on safe design principles for brain-like artificial intelligence?