rvnnt comments on Evolution Solved Alignment (what sharp left turn?)

rvnnt Oct 17, 2023, 12:39 PM
1 point
0
Evolution has succeeded at aligning homo sapiens brains to date

I’m guessing we agree on the following:
- Evolution shaped humans to have various context-dependent drives (call them Shards) and the ability to mentally represent and pursue complex goals. Those Shards were good proxies for IGF in the EEA^[1].
- Those Shards were also good^[2] enough to produce billions of humans in the modern environment. However, it is also the case that most modern humans spend at least part of their optimization power on things orthogonal to IGF.
I think our disagreement here maybe boils down to approximately the following question:

With what probability are we in each of the following worlds?
- (World A) The Shards only work^[2:1] conditional on the environment being sufficiently similar to the EEA, and humans not having too much optimization power. If the environment changes too far OOD, or if humans were to gain a lot of power^[3], then the Shards would cease to be good^[2:2] proxies.
  
  In this world, we should expect the future to contain only a small fraction^[4] of the “value” it would have, if humanity were fully “aligned”^[2:3]. I.e. Evolution failed to “(robustly) align humanity”.
- (World B) The Shards (in combination with other structures in human DNA/brains) are in fact sufficiently robust that they will keep humanity aligned^[2:4] even in the face of distributional shift and humans gaining vast optimization power.
  
  In this world, we should expect the future to contain a large fraction of the “value” it would have, if humanity were fully “aligned”^[2:5]. I.e. Evolution succeeded in “(robustly) aligning humanity”.
- (World C) Something else?
I think we’re probably in (A), and IIUC, you think we’re most likely in (B). Do you consider this an adequate characterization?

If yes, the obvious next question would be: What tests could we run, what observations could we make,^[5] that would help us discern whether we’re in (A) or (B) (or (C))?

(For example: I think the kinds of observations I listed in my previous comment are moderate-to-strong evidence for (A); and the existence of some explicit-IGF-maximizing humans is weak evidence for (B).)
1. ↩︎
  Environment of evolutionary adaptedness. For humans: hunter-gatherer tribes on the savanna, or maybe primitive subsistence agriculture societies.
2. ↩︎↩︎↩︎↩︎↩︎↩︎
  in the sense of optimizing for IGF, or whatever we’re imagining Evolution to “care” about.
3. ↩︎
  e.g. ability to upload their minds, construct virtual worlds, etc.
4. ↩︎
  Possibly (but not necessarily) still a large quantity in absolute terms.
5. ↩︎
  Without waiting a possibly-long time to watch how things in fact play out.
- jacob_cannell Oct 17, 2023, 6:02 PM
  1 point
  −3
  Parent
  I agree with your summary of what we agree on—that evolution succeeded at aligning brains to IGF so far. That was the key point of the OP.
  
  Before getting into World A vs World B, I need to clarify again that my standard for “success at alignment” is a much weaker criterion than you may be assuming. You seem to consider success to require getting near the maximum possible (ie large fraction) utility, which I believe is uselessly unrealistic. By success I simply mean not a failure, as in not the doom scenario of extinction or near zero utility.
  
  So Worlds A is still a partial success if there is some reasonable population of humans (say even just on the order of millions) in bio bodies or in detailed sims.
  
  (World A) The Shards only work[2:1] conditional on the environment being sufficiently similar to the EEA, and humans not having too much optimization power
  
  I don’t agree with this characterization—the EEA ended ~10k years ago and human fitness has exploded since then rather than collapsed to zero. It is a simple fact that according to any useful genetic fitness metric, human fitness has exploded with our exploding optimization power so far.
  
  I believe this is the dominate evidence, and it indicates:
  1. If tech evolution is similar enough to bio evolution then we should roughly expect tech evolution to have a similar level of success
  2. Likewise doom is unlikely unless the tech evolution process producing AGI has substantially different dynamics from the gene evolution process which produced brains
  See this comment for more on the tech/gene evolution analogy and potential differences.
  
  I don’t think your evidence from “opinions of people you know” is convincing for the same reasons I don’t think opinions from humans circa 1900 were much useful evidence for predicting the future of 2023.
  
  AFAIK, most of the many humans racing to build ASI are not doing so with the goal of increasing their IGF.
  
  I don’t think “humans explicitly optimizing for the goal of IGF” is even the correct frame to think of how human value learning works (see shard theory).
  
  As a concrete example, Elon Musk seems to be on track for high long term IGF, without consciously optimizing for IGF.
  - rvnnt Oct 18, 2023, 10:44 AM
    1 point
    −2
    Parent
    (Ah. Seems we were using the terms “(alignment) success/failure” differently. Thanks for noting it.)
    
    In-retrospect-obvious key question I should’ve already asked: Conditional on (some representative group of) humans succeeding at aligning ASI, what fraction of the maximum possible value-from-Evolution’s-perspective do you expect the future to attain? ^[1]
    
    My modal guess is that the future would attain ~1% of maximum possible “Evolution-value”.^[2]
    
    If tech evolution is similar enough to bio evolution then we should roughly expect tech evolution to have a similar level of success
    
    Seems like a reasonable (albeit very preliminary/weak) outside view, sure. So, under that heuristic, I’d guess that the future will attain ~1% of max possible “human-value”.
    
    ↩︎
    setting completely aside whether to consider the present “success” or “failure” from Evolution’s perspective.
    
    ↩︎
    I’d call that failure on Evolution’s part, but IIUC you’d call it partial success? (Since the absolute value would still be high?)
    - jacob_cannell Oct 18, 2023, 5:27 PM
      11 points
      5
      Parent
      In general I think maximum values are weird because they are potentially nearly unbounded, but it sounds like we may then be in agreement absent terminology.
      
      But in general I do not think of anything “less than 1% of the maximum value” as failure in most endeavors. For example the maximum attainable wealth is perhaps $100T or something, but I don’t think it’d be normal/useful to describe the world’s wealthiest people as failures at being wealthy because they only have ~$100B or whatever.
      
      And regardless the standard doom arguments from EY/MIRI etc are very much “AI will kill us all!”, and not “AI will prevent us from attaining over 1% of maximum future utility!”