I’ve started commenting on this discussion on a Google Doc. Here are some excerpts:
During this step, if humanity is to survive, somebody has to perform some feat that causes the world to not be destroyed in 3 months or 2 years when too many actors have access to AGI code that will destroy the world if its intelligence dial is turned up.
Contains implicit assumptions about takeoff that I don’t currently buy:
Well-modelled as binary “has-AGI?” predicate;
(I am sympathetic to the microeconomics of intelligence explosion working out in a way where “Well-modelled as binary “has-AGI?” predicate is true, but I feel uncertain about the prospect)
Somehow rules out situations like: We have somewhat aligned AIs which push the world to make future unaligned AIs slightly less likely, which makes the AI population more aligned on average; this cycle compounds until we’re descending very fast into the basin of alignment and goodness.
This isn’t my mainline or anything, but I note that it’s ruled out by Eliezer’s model as I understand it.
Some other internal objections are arising and I’m not going to focus on them now.
Every AI output effectuates outcomes in the world.
Right but the likely domain of cognitive discourse matters. Pac-Man agents effectuate outcomes in the world, but their optimal policies are harmless. So the question seems to hinge on when the domain of cognition shifts to put us in the crosshairs of performant policies.
This doesn’t mean Eliezer is wrong here about the broader claim, but the distinction deserves mentioning for the people who weren’t tracking it. (I think EY is obviously aware of this)
If you knew about the things that humans are using to reuse their reasoning about chipped handaxes and other humans, to prove math theorems, you would see it as more plausible that proving math theorems would generalize to chipping handaxes and manipulating humans.
Could we have observed it any other way? Since we surely wouldn’t have been selected for proving math theorems, we wouldn’t have a native cortex specializing in math. So conditional on considering things like theorem-proving at all, it has to reuse other native capabilities.
More precisely, one possible mind design which solves theorems also reasons about humans. This is some update from whatever prior, towards EY’s claim. I’m considering whether we know enough about the common cause (evolution giving us a general-purpose reasoning algorithm) to screen off/reduce the Theorems → Human-modelling update.
So here’s one important difference between humans and neural networks: humans face the genomic bottleneck which means that each individual has to rederive all the knowledge about the world that their parents already had. If this genetic bottleneck hadn’t been so tight, then individual humans would have been significantly less capable of performing novel tasks.
Thanks, Richard—this is a cool argument that I hadn’t heard before.
You will systematically overestimate how much easier, or how far you can push the science part without getting the taking-over-the-world part, for as long as your model is ignorant of what they have in common.
OK, it’s a valid point and I’m updating a little, under the apparent model of “here’s a set of AI capabilities, linearly ordered in terms of deep-problem-solving, and if you push too far you get taking-over-the-world.” But I don’t see how we get to that model to begin with.
I’ve started commenting on this discussion on a Google Doc. Here are some excerpts:
Contains implicit assumptions about takeoff that I don’t currently buy:
Well-modelled as binary “has-AGI?” predicate;
(I am sympathetic to the microeconomics of intelligence explosion working out in a way where “Well-modelled as binary “has-AGI?” predicate is true, but I feel uncertain about the prospect)
Somehow rules out situations like: We have somewhat aligned AIs which push the world to make future unaligned AIs slightly less likely, which makes the AI population more aligned on average; this cycle compounds until we’re descending very fast into the basin of alignment and goodness.
This isn’t my mainline or anything, but I note that it’s ruled out by Eliezer’s model as I understand it.
Some other internal objections are arising and I’m not going to focus on them now.
Right but the likely domain of cognitive discourse matters. Pac-Man agents effectuate outcomes in the world, but their optimal policies are harmless. So the question seems to hinge on when the domain of cognition shifts to put us in the crosshairs of performant policies.
This doesn’t mean Eliezer is wrong here about the broader claim, but the distinction deserves mentioning for the people who weren’t tracking it. (I think EY is obviously aware of this)
Could we have observed it any other way? Since we surely wouldn’t have been selected for proving math theorems, we wouldn’t have a native cortex specializing in math. So conditional on considering things like theorem-proving at all, it has to reuse other native capabilities.
More precisely, one possible mind design which solves theorems also reasons about humans. This is some update from whatever prior, towards EY’s claim. I’m considering whether we know enough about the common cause (evolution giving us a general-purpose reasoning algorithm) to screen off/reduce the Theorems → Human-modelling update.
Thanks, Richard—this is a cool argument that I hadn’t heard before.
OK, it’s a valid point and I’m updating a little, under the apparent model of “here’s a set of AI capabilities, linearly ordered in terms of deep-problem-solving, and if you push too far you get taking-over-the-world.” But I don’t see how we get to that model to begin with.