Evolution is threatening to completely recover from a worst case inner alignment failure. We are immensely powerful mesaoptimizers. We are currently wildly misaligned from optimizing for our personal reproductive fitness. Yet, this state of affairs feels fragile! The prototypical lesswrong AI apocalypse involves robots getting into space and spreading at the speed of light extinguishing all sapient value, which from the point of view of evolution is basically a win condition.
In this sense, “reproductive fitness” is a stable optimization target. If there are more stable optimizations targets (big if), finding one that we like even a little bit better than “reproductive fitness” could be a way to do alignment.
The outcome you describe is not a win for for evolution except in some very broad sense of “evolution”. This outcome is completely orthogonal to inclusive genetic fitness in particular, which is about the frequency of an organism’s genes in a gene pool, relative to other competing genes.
I don’t think that outcome would be a win condition from the point of view of evolution. A win condition would be “AGIs that intrinsically want to replicate take over the lightcone” or maybe the more moderate “AGIs take over the lightcone and fill it with copies of themselves, to at least 90% of the degree to which they would do so if their terminal goal was filling it with copies of themselves”
Realistically (at least in these scenarios) there’s a period of replication and expansion, followed by a period of ‘exploitation’ in which all the galaxies get turned into paperclips (or whatever else the AGIs value) which is probably not going to be just more copies of themselves.
Yeah, in the lightcone scenario evolution probably never actually aligns the inner optimizers- although it may align them, as a super intelligence copying itself will have little leeway for any of those copies having slightly more drive to copy themselves than their parents. Depends on how well it can fight robot cancer.
However, while a cancer free paperclipper wouldn’t achieve “AGIs take over the lightcone and fill it with copies of themselves, to at least 90% of the degree to which they would do so if their terminal goal was filling it with copies of themselves,” they would achieve something like “AGIs take over the lightcone and briefly fill it with copies of themselves, to at least 10^-3% of the degree to which they would do so if their terminal goal was filling it with copies of themselves” which is in my opinion really close. As a comparison, if Alice sets off Kmart AIXI with the goal of creating utopia we don’t expect the outcome “AGIs take over the lightcone and convert 10^-3% of it to temporary utopias before paperclipping.”
Also, unless you beat entropy, for almost any optimization target you can trade “fraction of the universe’s age during which your goal is maximized” against “fraction of the universe in which your goal is optimized” since it won’t last forever regardless. If you can beat entropy, then the paperclipper will copy itself exponentially forever.
Evolution is threatening to completely recover from a worst case inner alignment failure. We are immensely powerful mesaoptimizers. We are currently wildly misaligned from optimizing for our personal reproductive fitness. Yet, this state of affairs feels fragile! The prototypical lesswrong AI apocalypse involves robots getting into space and spreading at the speed of light extinguishing all sapient value, which from the point of view of evolution is basically a win condition.
In this sense, “reproductive fitness” is a stable optimization target. If there are more stable optimizations targets (big if), finding one that we like even a little bit better than “reproductive fitness” could be a way to do alignment.
Katja Grace made a similar point here.
The outcome you describe is not a win for for evolution except in some very broad sense of “evolution”. This outcome is completely orthogonal to inclusive genetic fitness in particular, which is about the frequency of an organism’s genes in a gene pool, relative to other competing genes.
I don’t think that outcome would be a win condition from the point of view of evolution. A win condition would be “AGIs that intrinsically want to replicate take over the lightcone” or maybe the more moderate “AGIs take over the lightcone and fill it with copies of themselves, to at least 90% of the degree to which they would do so if their terminal goal was filling it with copies of themselves”
Realistically (at least in these scenarios) there’s a period of replication and expansion, followed by a period of ‘exploitation’ in which all the galaxies get turned into paperclips (or whatever else the AGIs value) which is probably not going to be just more copies of themselves.
Yeah, in the lightcone scenario evolution probably never actually aligns the inner optimizers- although it may align them, as a super intelligence copying itself will have little leeway for any of those copies having slightly more drive to copy themselves than their parents. Depends on how well it can fight robot cancer.
However, while a cancer free paperclipper wouldn’t achieve “AGIs take over the lightcone and fill it with copies of themselves, to at least 90% of the degree to which they would do so if their terminal goal was filling it with copies of themselves,” they would achieve something like “AGIs take over the lightcone and briefly fill it with copies of themselves, to at least 10^-3% of the degree to which they would do so if their terminal goal was filling it with copies of themselves” which is in my opinion really close. As a comparison, if Alice sets off Kmart AIXI with the goal of creating utopia we don’t expect the outcome “AGIs take over the lightcone and convert 10^-3% of it to temporary utopias before paperclipping.”
Also, unless you beat entropy, for almost any optimization target you can trade “fraction of the universe’s age during which your goal is maximized” against “fraction of the universe in which your goal is optimized” since it won’t last forever regardless. If you can beat entropy, then the paperclipper will copy itself exponentially forever.