Hastings comments on Open Thread Summer 2024

Hastings 8 Jul 2024 14:27 UTC
4 points
−20
Evolution is threatening to completely recover from a worst case inner alignment failure. We are immensely powerful mesaoptimizers. We are currently wildly misaligned from optimizing for our personal reproductive fitness. Yet, this state of affairs feels fragile! The prototypical lesswrong AI apocalypse involves robots getting into space and spreading at the speed of light extinguishing all sapient value, which from the point of view of evolution is basically a win condition.

In this sense, “reproductive fitness” is a stable optimization target. If there are more stable optimizations targets (big if), finding one that we like even a little bit better than “reproductive fitness” could be a way to do alignment.
- Eli Tyre 19 Jul 2024 23:39 UTC
  6 points
  0
  Parent
  Katja Grace made a similar point here.
  The outcome you describe is not a win for for evolution except in some very broad sense of “evolution”. This outcome is completely orthogonal to inclusive genetic fitness in particular, which is about the frequency of an organism’s genes in a gene pool, relative to other competing genes.
- Daniel Kokotajlo 8 Jul 2024 22:33 UTC
  6 points
  2
  Parent
  I don’t think that outcome would be a win condition from the point of view of evolution. A win condition would be “AGIs that intrinsically want to replicate take over the lightcone” or maybe the more moderate “AGIs take over the lightcone and fill it with copies of themselves, to at least 90% of the degree to which they would do so if their terminal goal was filling it with copies of themselves”
  
  Realistically (at least in these scenarios) there’s a period of replication and expansion, followed by a period of ‘exploitation’ in which all the galaxies get turned into paperclips (or whatever else the AGIs value) which is probably not going to be just more copies of themselves.
  - Hastings 9 Jul 2024 12:11 UTC
    1 point
    0
    Parent
    Yeah, in the lightcone scenario evolution probably never actually aligns the inner optimizers- although it may align them, as a super intelligence copying itself will have little leeway for any of those copies having slightly more drive to copy themselves than their parents. Depends on how well it can fight robot cancer.
    
    However, while a cancer free paperclipper wouldn’t achieve “AGIs take over the lightcone and fill it with copies of themselves, to at least 90% of the degree to which they would do so if their terminal goal was filling it with copies of themselves,” they would achieve something like “AGIs take over the lightcone and briefly fill it with copies of themselves, to at least 10^-3% of the degree to which they would do so if their terminal goal was filling it with copies of themselves” which is in my opinion really close. As a comparison, if Alice sets off Kmart AIXI with the goal of creating utopia we don’t expect the outcome “AGIs take over the lightcone and convert 10^-3% of it to temporary utopias before paperclipping.”
    
    Also, unless you beat entropy, for almost any optimization target you can trade “fraction of the universe’s age during which your goal is maximized” against “fraction of the universe in which your goal is optimized” since it won’t last forever regardless. If you can beat entropy, then the paperclipper will copy itself exponentially forever.