I agree with your summary of what we agree on—that evolution succeeded at aligning brains to IGF so far. That was the key point of the OP.
Before getting into World A vs World B, I need to clarify again that my standard for “success at alignment” is a much weaker criterion than you may be assuming. You seem to consider success to require getting near the maximum possible (ie large fraction) utility, which I believe is uselessly unrealistic. By success I simply mean not a failure, as in not the doom scenario of extinction or near zero utility.
So Worlds A is still a partial success if there is some reasonable population of humans (say even just on the order of millions) in bio bodies or in detailed sims.
(World A) The Shards only work[2:1] conditional on the environment being sufficiently similar to the EEA, and humans not having too much optimization power
I don’t agree with this characterization—the EEA ended ~10k years ago and human fitness has exploded since then rather than collapsed to zero. It is a simple fact that according to any useful genetic fitness metric, human fitness has exploded with our exploding optimization power so far.
I believe this is the dominate evidence, and it indicates:
If tech evolution is similar enough to bio evolution then we should roughly expect tech evolution to have a similar level of success
Likewise doom is unlikely unless the tech evolution process producing AGI has substantially different dynamics from the gene evolution process which produced brains
See this comment for more on the tech/gene evolution analogy and potential differences.
I don’t think your evidence from “opinions of people you know” is convincing for the same reasons I don’t think opinions from humans circa 1900 were much useful evidence for predicting the future of 2023.
AFAIK, most of the many humans racing to build ASI are not doing so with the goal of increasing their IGF.
I don’t think “humans explicitly optimizing for the goal of IGF” is even the correct frame to think of how human value learning works (see shard theory).
As a concrete example, Elon Musk seems to be on track for high long term IGF, without consciously optimizing for IGF.
(Ah. Seems we were using the terms “(alignment) success/failure” differently. Thanks for noting it.)
In-retrospect-obvious key question I should’ve already asked:
Conditional on (some representative group of) humans succeeding at aligning ASI, what fraction of the maximum possible value-from-Evolution’s-perspective do you expect the future to attain? [1]
My modal guess is that the future would attain ~1% of maximum possible “Evolution-value”.[2]
If tech evolution is similar enough to bio evolution then we should roughly expect tech evolution to have a similar level of success
Seems like a reasonable (albeit very preliminary/weak) outside view, sure. So, under that heuristic, I’d guess that the future will attain ~1% of max possible “human-value”.
In general I think maximum values are weird because they are potentially nearly unbounded, but it sounds like we may then be in agreement absent terminology.
But in general I do not think of anything “less than 1% of the maximum value” as failure in most endeavors. For example the maximum attainable wealth is perhaps $100T or something, but I don’t think it’d be normal/useful to describe the world’s wealthiest people as failures at being wealthy because they only have ~$100B or whatever.
And regardless the standard doom arguments from EY/MIRI etc are very much “AI will kill us all!”, and not “AI will prevent us from attaining over 1% of maximum future utility!”
I agree with your summary of what we agree on—that evolution succeeded at aligning brains to IGF so far. That was the key point of the OP.
Before getting into World A vs World B, I need to clarify again that my standard for “success at alignment” is a much weaker criterion than you may be assuming. You seem to consider success to require getting near the maximum possible (ie large fraction) utility, which I believe is uselessly unrealistic. By success I simply mean not a failure, as in not the doom scenario of extinction or near zero utility.
So Worlds A is still a partial success if there is some reasonable population of humans (say even just on the order of millions) in bio bodies or in detailed sims.
I don’t agree with this characterization—the EEA ended ~10k years ago and human fitness has exploded since then rather than collapsed to zero. It is a simple fact that according to any useful genetic fitness metric, human fitness has exploded with our exploding optimization power so far.
I believe this is the dominate evidence, and it indicates:
If tech evolution is similar enough to bio evolution then we should roughly expect tech evolution to have a similar level of success
Likewise doom is unlikely unless the tech evolution process producing AGI has substantially different dynamics from the gene evolution process which produced brains
See this comment for more on the tech/gene evolution analogy and potential differences.
I don’t think your evidence from “opinions of people you know” is convincing for the same reasons I don’t think opinions from humans circa 1900 were much useful evidence for predicting the future of 2023.
I don’t think “humans explicitly optimizing for the goal of IGF” is even the correct frame to think of how human value learning works (see shard theory).
As a concrete example, Elon Musk seems to be on track for high long term IGF, without consciously optimizing for IGF.
(Ah. Seems we were using the terms “(alignment) success/failure” differently. Thanks for noting it.)
In-retrospect-obvious key question I should’ve already asked: Conditional on (some representative group of) humans succeeding at aligning ASI, what fraction of the maximum possible value-from-Evolution’s-perspective do you expect the future to attain? [1]
My modal guess is that the future would attain ~1% of maximum possible “Evolution-value”.[2]
Seems like a reasonable (albeit very preliminary/weak) outside view, sure. So, under that heuristic, I’d guess that the future will attain ~1% of max possible “human-value”.
setting completely aside whether to consider the present “success” or “failure” from Evolution’s perspective.
I’d call that failure on Evolution’s part, but IIUC you’d call it partial success? (Since the absolute value would still be high?)
In general I think maximum values are weird because they are potentially nearly unbounded, but it sounds like we may then be in agreement absent terminology.
But in general I do not think of anything “less than 1% of the maximum value” as failure in most endeavors. For example the maximum attainable wealth is perhaps $100T or something, but I don’t think it’d be normal/useful to describe the world’s wealthiest people as failures at being wealthy because they only have ~$100B or whatever.
And regardless the standard doom arguments from EY/MIRI etc are very much “AI will kill us all!”, and not “AI will prevent us from attaining over 1% of maximum future utility!”