A rough summary is included in my footnote: [6] By “recursively self-improving AGI”, I’m specifically referring to an AGI that can complete an intelligence explosion within a year [or hours],at the end of which it will have found something like the optimal algorithms for intelligence per relevant unit of computation. (“Optimally optimized optimizer” is another way of putting it.)
I have a strong intuition that “optimal algorithms for intelligence per relevant unit of computation” don’t exist. There are lots of no-free lunch theorems around this. Intelligence is contextual; as a concrete example, children are better than adults in novel situations with unusual causal factors (https://cocosci.berkeley.edu/tom/papers/LabPublications/GopnicketalYoungLearners.pdf). In AI, the explore-exploit tradeoff is quite fundamental and it seems unlikely that you can find a fully general solution to it.
I still don’t know what “intelligence explosion within a year” means; is it relative to human intelligence? The intelligence of the previous AGI? Along what metric are you measuring intelligence? If I consider the “reasonable view” of what these terms mean, I expect that there will never be an intelligence explosion that would be considered “fast” (in the way that AGI intelligence explosion in a year would be “fast” to us) by the next-most intelligent system that exists.
You could imagine analogizing the first AGI we build to the first dynamite we ever build. You could analogize a foomed AGI to a really big dynamite, but I think it’s more accurate to analogize it to a nuclear bomb, given the positive feedback loops involved.
I’m not sure what I’m supposed to get out of the analogy. If you’re saying that a foomed AGI is way more powerful than the first AGI, sure. If you’re saying they can do qualitatively different things, sure.
I expect the intelligence differential between our first AGI and a foomed AGI to be numerous orders of magnitude larger than the intelligence differential between a chimp and a human.
I don’t know if I’d say I expect this, but I do consider this scenario often so I’m happy to talk about it, and I have been assuming that during this discussion.
In this “nuclear explosion” of intelligence, I expect the equivalent of millions of years of human cognitive labor to elapse, if not many more.
I’m still very unclear on how you’re operationalizing an intelligence explosion. If an intelligence explosion happens only after a million iterations of AGI systems improving themselves, then this seems true to me, but also the humans will have AGI systems that are way smarter than them to assist them during this time.
I imagine you either having a different picture of takeoff, or thinking something like “Just don’t build a foomed AGI. Just like it’s way too hard to build AGIs that competently optimize for our values for 1,000,000,000 years, it’s way too hard to build a safe foomed AGI, so let’s just not do it”.
I think it’s the first. I’m much more sympathetic to the picture of “slow” takeoff in Will AI See Sudden Progress? and Takeoff speeds. I don’t imagine ever building a very capable AI that explicitly optimizes a utility function, since a multiagent system (i.e. humanity) is unlikely to have a utility function. However, I can imagine building a safe foomed AGI.
And my position is something like “It’s probably inevitable, and I think it will turn out well if we make a lot of intellectual progress (probably involving solutions to metaphilosophy and zero-shot reasoning, which I think are deeply related). In the meantime, let’s do what we can to ensure that nation-states and individual actors will understand this point well enough to coordinate around not doing it until the time is right.”
It would be quite surprising to me if the right thing to do to ensure that nation states and individual actors understand this point would be to formalize zero-shot reasoning.
In addition, I could imagine building a safe foomed AGI that is corrigible and so does not require a solution to metaphilosophy; but I’m happy to consider the case where that is necessary (which seems decently likely to me), in those worlds I expect that we are able to use the first AGI systems to help us figure out metaphilosophy.
I’m happy to delve into your individual points, but before I do so, I’d like to get your sense of what you think our remaining disagreements are, and where you think we might still be talking about different things.
What takeoff looks like, what the notion of “intelligence” is, what an “intelligence explosion” consists of, the usefulness of initial AI systems in aligning future, more powerful AI systems, what daemons are.
Also, on a more epistemic note, how much weight to put on long chains of reasoning that rely on soft, intuitive concepts, and how much to trust intuitions about tasks longer than ~100 years.
I have a strong intuition that “optimal algorithms for intelligence per relevant unit of computation” don’t exist. There are lots of no-free lunch theorems around this. Intelligence is contextual; as a concrete example, children are better than adults in novel situations with unusual causal factors (https://cocosci.berkeley.edu/tom/papers/LabPublications/GopnicketalYoungLearners.pdf). In AI, the explore-exploit tradeoff is quite fundamental and it seems unlikely that you can find a fully general solution to it.
I still don’t know what “intelligence explosion within a year” means; is it relative to human intelligence? The intelligence of the previous AGI? Along what metric are you measuring intelligence? If I consider the “reasonable view” of what these terms mean, I expect that there will never be an intelligence explosion that would be considered “fast” (in the way that AGI intelligence explosion in a year would be “fast” to us) by the next-most intelligent system that exists.
I’m not sure what I’m supposed to get out of the analogy. If you’re saying that a foomed AGI is way more powerful than the first AGI, sure. If you’re saying they can do qualitatively different things, sure.
I don’t know if I’d say I expect this, but I do consider this scenario often so I’m happy to talk about it, and I have been assuming that during this discussion.
I’m still very unclear on how you’re operationalizing an intelligence explosion. If an intelligence explosion happens only after a million iterations of AGI systems improving themselves, then this seems true to me, but also the humans will have AGI systems that are way smarter than them to assist them during this time.
I think it’s the first. I’m much more sympathetic to the picture of “slow” takeoff in Will AI See Sudden Progress? and Takeoff speeds. I don’t imagine ever building a very capable AI that explicitly optimizes a utility function, since a multiagent system (i.e. humanity) is unlikely to have a utility function. However, I can imagine building a safe foomed AGI.
It would be quite surprising to me if the right thing to do to ensure that nation states and individual actors understand this point would be to formalize zero-shot reasoning.
In addition, I could imagine building a safe foomed AGI that is corrigible and so does not require a solution to metaphilosophy; but I’m happy to consider the case where that is necessary (which seems decently likely to me), in those worlds I expect that we are able to use the first AGI systems to help us figure out metaphilosophy.
What takeoff looks like, what the notion of “intelligence” is, what an “intelligence explosion” consists of, the usefulness of initial AI systems in aligning future, more powerful AI systems, what daemons are.
Also, on a more epistemic note, how much weight to put on long chains of reasoning that rely on soft, intuitive concepts, and how much to trust intuitions about tasks longer than ~100 years.