So it seems premature to say that Evolution succeeded at aligning humanity.
Evolution has succeeded at aligning homo sapiens brains to date[1] - that is the historical evidence we have.
I don’t think most transhumanists explicitly want to end biological life, and most would find that abhorrent. Transcending to a postbiological state probably doesn’t end biology any more/less than biology ended geology.
There are many other scenarios where DNA flourishes even after a posthuman transition.
Interesting. Could you list a few of those scenarios?
The future is complex and unknown. Is the ‘DNA’ we are discussing the information content or the physical medium? Seems rather obvious it’s the information that matters, not the medium. Transcendence to a posthuman state probably involves vast computation some of which is applied to ancestral simulations (which we may already be in) which enormously preserves and multiplies the info content of the DNA.
Not perfectly of course but evolution doesn’t do anything perfectly. It aligned brains well enough such that the extent of any misalignment was insignificant compared to the enormous utility our brains provided.
Evolution has succeeded at aligning homo sapiens brains to date
I’m guessing we agree on the following:
Evolution shaped humans to have various context-dependent drives (call them Shards) and the ability to mentally represent and pursue complex goals. Those Shards were good proxies for IGF in the EEA[1].
Those Shards were also good[2] enough to produce billions of humans in the modern environment. However, it is also the case that most modern humans spend at least part of their optimization power on things orthogonal to IGF.
I think our disagreement here maybe boils down to approximately the following question:
With what probability are we in each of the following worlds?
(World A) The Shards only work[2:1] conditional on the environment being sufficiently similar to the EEA, and humans not having too much optimization power. If the environment changes too far OOD, or if humans were to gain a lot of power[3], then the Shards would cease to be good[2:2] proxies.
In this world, we should expect the future to contain only a small fraction[4] of the “value” it would have, if humanity were fully “aligned”[2:3]. I.e. Evolution failed to “(robustly) align humanity”.
(World B) The Shards (in combination with other structures in human DNA/brains) are in fact sufficiently robust that they will keep humanity aligned[2:4] even in the face of distributional shift and humans gaining vast optimization power.
In this world, we should expect the future to contain a large fraction of the “value” it would have, if humanity were fully “aligned”[2:5]. I.e. Evolution succeeded in “(robustly) aligning humanity”.
(World C) Something else?
I think we’re probably in (A), and IIUC, you think we’re most likely in (B).
Do you consider this an adequate characterization?
If yes, the obvious next question would be:
What tests could we run, what observations could we make,[5] that would help us discern whether we’re in (A) or (B) (or (C))?
(For example: I think the kinds of observations I listed in my previous comment are moderate-to-strong evidence for (A); and the existence of some explicit-IGF-maximizing humans is weak evidence for (B).)
I agree with your summary of what we agree on—that evolution succeeded at aligning brains to IGF so far. That was the key point of the OP.
Before getting into World A vs World B, I need to clarify again that my standard for “success at alignment” is a much weaker criterion than you may be assuming. You seem to consider success to require getting near the maximum possible (ie large fraction) utility, which I believe is uselessly unrealistic. By success I simply mean not a failure, as in not the doom scenario of extinction or near zero utility.
So Worlds A is still a partial success if there is some reasonable population of humans (say even just on the order of millions) in bio bodies or in detailed sims.
(World A) The Shards only work[2:1] conditional on the environment being sufficiently similar to the EEA, and humans not having too much optimization power
I don’t agree with this characterization—the EEA ended ~10k years ago and human fitness has exploded since then rather than collapsed to zero. It is a simple fact that according to any useful genetic fitness metric, human fitness has exploded with our exploding optimization power so far.
I believe this is the dominate evidence, and it indicates:
If tech evolution is similar enough to bio evolution then we should roughly expect tech evolution to have a similar level of success
Likewise doom is unlikely unless the tech evolution process producing AGI has substantially different dynamics from the gene evolution process which produced brains
See this comment for more on the tech/gene evolution analogy and potential differences.
I don’t think your evidence from “opinions of people you know” is convincing for the same reasons I don’t think opinions from humans circa 1900 were much useful evidence for predicting the future of 2023.
AFAIK, most of the many humans racing to build ASI are not doing so with the goal of increasing their IGF.
I don’t think “humans explicitly optimizing for the goal of IGF” is even the correct frame to think of how human value learning works (see shard theory).
As a concrete example, Elon Musk seems to be on track for high long term IGF, without consciously optimizing for IGF.
(Ah. Seems we were using the terms “(alignment) success/failure” differently. Thanks for noting it.)
In-retrospect-obvious key question I should’ve already asked:
Conditional on (some representative group of) humans succeeding at aligning ASI, what fraction of the maximum possible value-from-Evolution’s-perspective do you expect the future to attain? [1]
My modal guess is that the future would attain ~1% of maximum possible “Evolution-value”.[2]
If tech evolution is similar enough to bio evolution then we should roughly expect tech evolution to have a similar level of success
Seems like a reasonable (albeit very preliminary/weak) outside view, sure. So, under that heuristic, I’d guess that the future will attain ~1% of max possible “human-value”.
In general I think maximum values are weird because they are potentially nearly unbounded, but it sounds like we may then be in agreement absent terminology.
But in general I do not think of anything “less than 1% of the maximum value” as failure in most endeavors. For example the maximum attainable wealth is perhaps $100T or something, but I don’t think it’d be normal/useful to describe the world’s wealthiest people as failures at being wealthy because they only have ~$100B or whatever.
And regardless the standard doom arguments from EY/MIRI etc are very much “AI will kill us all!”, and not “AI will prevent us from attaining over 1% of maximum future utility!”
vast computation some of which is applied to ancestral simulations
I agree that a successful post-human world would probably involve a large amount[1] of resources spent on simulating (or physically instantiating) things like humans engaging in play, sex, adventure, violence, etc. IOW, engaging in the things for which Evolution installed Shards in us. However, I think that is not the same as [whatever Evolution would care about, if Evolution could care about anything]. For the post-human future to be a success from Evolution’s perspective, I think it would have to be full of something more like [programs (sentient or not, DNA or digital) striving to make as many copies of themselves as possible].
(If we make the notion of “DNA” too broad/vague, then we could interpret almost any future outcome as “success for Evolution”.)
For the post-human future to be a success from Evolution’s perspective, I think it would have to be full of something more like [programs (sentient or not, DNA or digital) striving to make as many copies of themselves as possible].
Any ancestral simulation will naturally be full of that, so it boils down to the simulation argument.
The natural consequence of “postbiological humans” is effective disempowerment if not extinction of humanity as a whole.
Such “transhumanists” clearly do not find the eradication of biology abhorrent, any more than any normal person would find the idea of “substrate independence”(death of all love and life) to be abhorrent.
Evolution has succeeded at aligning homo sapiens brains to date[1] - that is the historical evidence we have.
I don’t think most transhumanists explicitly want to end biological life, and most would find that abhorrent. Transcending to a postbiological state probably doesn’t end biology any more/less than biology ended geology.
The future is complex and unknown. Is the ‘DNA’ we are discussing the information content or the physical medium? Seems rather obvious it’s the information that matters, not the medium. Transcendence to a posthuman state probably involves vast computation some of which is applied to ancestral simulations (which we may already be in) which enormously preserves and multiplies the info content of the DNA.
Not perfectly of course but evolution doesn’t do anything perfectly. It aligned brains well enough such that the extent of any misalignment was insignificant compared to the enormous utility our brains provided.
I’m guessing we agree on the following:
Evolution shaped humans to have various context-dependent drives (call them Shards) and the ability to mentally represent and pursue complex goals. Those Shards were good proxies for IGF in the EEA[1].
Those Shards were also good[2] enough to produce billions of humans in the modern environment. However, it is also the case that most modern humans spend at least part of their optimization power on things orthogonal to IGF.
I think our disagreement here maybe boils down to approximately the following question:
With what probability are we in each of the following worlds?
(World A) The Shards only work[2:1] conditional on the environment being sufficiently similar to the EEA, and humans not having too much optimization power. If the environment changes too far OOD, or if humans were to gain a lot of power[3], then the Shards would cease to be good[2:2] proxies.
In this world, we should expect the future to contain only a small fraction[4] of the “value” it would have, if humanity were fully “aligned”[2:3]. I.e. Evolution failed to “(robustly) align humanity”.
(World B) The Shards (in combination with other structures in human DNA/brains) are in fact sufficiently robust that they will keep humanity aligned[2:4] even in the face of distributional shift and humans gaining vast optimization power.
In this world, we should expect the future to contain a large fraction of the “value” it would have, if humanity were fully “aligned”[2:5]. I.e. Evolution succeeded in “(robustly) aligning humanity”.
(World C) Something else?
I think we’re probably in (A), and IIUC, you think we’re most likely in (B). Do you consider this an adequate characterization?
If yes, the obvious next question would be: What tests could we run, what observations could we make,[5] that would help us discern whether we’re in (A) or (B) (or (C))?
(For example: I think the kinds of observations I listed in my previous comment are moderate-to-strong evidence for (A); and the existence of some explicit-IGF-maximizing humans is weak evidence for (B).)
Environment of evolutionary adaptedness. For humans: hunter-gatherer tribes on the savanna, or maybe primitive subsistence agriculture societies.
in the sense of optimizing for IGF, or whatever we’re imagining Evolution to “care” about.
e.g. ability to upload their minds, construct virtual worlds, etc.
Possibly (but not necessarily) still a large quantity in absolute terms.
Without waiting a possibly-long time to watch how things in fact play out.
I agree with your summary of what we agree on—that evolution succeeded at aligning brains to IGF so far. That was the key point of the OP.
Before getting into World A vs World B, I need to clarify again that my standard for “success at alignment” is a much weaker criterion than you may be assuming. You seem to consider success to require getting near the maximum possible (ie large fraction) utility, which I believe is uselessly unrealistic. By success I simply mean not a failure, as in not the doom scenario of extinction or near zero utility.
So Worlds A is still a partial success if there is some reasonable population of humans (say even just on the order of millions) in bio bodies or in detailed sims.
I don’t agree with this characterization—the EEA ended ~10k years ago and human fitness has exploded since then rather than collapsed to zero. It is a simple fact that according to any useful genetic fitness metric, human fitness has exploded with our exploding optimization power so far.
I believe this is the dominate evidence, and it indicates:
If tech evolution is similar enough to bio evolution then we should roughly expect tech evolution to have a similar level of success
Likewise doom is unlikely unless the tech evolution process producing AGI has substantially different dynamics from the gene evolution process which produced brains
See this comment for more on the tech/gene evolution analogy and potential differences.
I don’t think your evidence from “opinions of people you know” is convincing for the same reasons I don’t think opinions from humans circa 1900 were much useful evidence for predicting the future of 2023.
I don’t think “humans explicitly optimizing for the goal of IGF” is even the correct frame to think of how human value learning works (see shard theory).
As a concrete example, Elon Musk seems to be on track for high long term IGF, without consciously optimizing for IGF.
(Ah. Seems we were using the terms “(alignment) success/failure” differently. Thanks for noting it.)
In-retrospect-obvious key question I should’ve already asked: Conditional on (some representative group of) humans succeeding at aligning ASI, what fraction of the maximum possible value-from-Evolution’s-perspective do you expect the future to attain? [1]
My modal guess is that the future would attain ~1% of maximum possible “Evolution-value”.[2]
Seems like a reasonable (albeit very preliminary/weak) outside view, sure. So, under that heuristic, I’d guess that the future will attain ~1% of max possible “human-value”.
setting completely aside whether to consider the present “success” or “failure” from Evolution’s perspective.
I’d call that failure on Evolution’s part, but IIUC you’d call it partial success? (Since the absolute value would still be high?)
In general I think maximum values are weird because they are potentially nearly unbounded, but it sounds like we may then be in agreement absent terminology.
But in general I do not think of anything “less than 1% of the maximum value” as failure in most endeavors. For example the maximum attainable wealth is perhaps $100T or something, but I don’t think it’d be normal/useful to describe the world’s wealthiest people as failures at being wealthy because they only have ~$100B or whatever.
And regardless the standard doom arguments from EY/MIRI etc are very much “AI will kill us all!”, and not “AI will prevent us from attaining over 1% of maximum future utility!”
I agree that a successful post-human world would probably involve a large amount[1] of resources spent on simulating (or physically instantiating) things like humans engaging in play, sex, adventure, violence, etc. IOW, engaging in the things for which Evolution installed Shards in us. However, I think that is not the same as [whatever Evolution would care about, if Evolution could care about anything]. For the post-human future to be a success from Evolution’s perspective, I think it would have to be full of something more like [programs (sentient or not, DNA or digital) striving to make as many copies of themselves as possible].
(If we make the notion of “DNA” too broad/vague, then we could interpret almost any future outcome as “success for Evolution”.)
a large absolute amount, but maybe not a large relative amount.
Any ancestral simulation will naturally be full of that, so it boils down to the simulation argument.
The natural consequence of “postbiological humans” is effective disempowerment if not extinction of humanity as a whole.
Such “transhumanists” clearly do not find the eradication of biology abhorrent, any more than any normal person would find the idea of “substrate independence”(death of all love and life) to be abhorrent.