Value drift due to cultural evolution (Hanson’s cheerfully deliveredhorror stories) seems like a necessary issue to address, no less important than unintended misalignment through change of architecture. Humans are barely capable of noticing that this is a problem, so at some still-aligned level of advancement there needs to be sufficient comprehension and coordination to do something about this.
For example, I don’t expect good things by default from every human getting uploaded and then let loose on the Internet, even if synthetic AGIs are somehow rendered impossible. A natural equilibrium this settles into could be pretty bad, and humans won’t be able to design and enforce a better equilibrium before it’s too late.
humans won’t be able to design and enforce a better equilibrium before it’s too late
Won’t the uploaded humans be able to do this? If you think the current world isn’t already on an irreversible course towards unacceptable value drift(such that a singleton is needed to repair things) I don’t see how the mass upload scenario is much worse, since any agreement we could negotiate to mitigate drift, the uploads could make too. The uploads would now have the ability to copy themselves and run at different speeds, but that doesn’t seem to obviously make coordination much harder.
The problem is that willingness to change gives power, leaving those most concerned with current value drift behind, hence “evolution”. With enough change, original stakeholders become disempowered, and additionally interventions that suffice to stop or reverse this process become more costly.
So interventions are more plausible to succeed if done in advance, before the current equilibrium is unmoored. Which requires apparently superhuman foresight.
Yes, although I don’t think a full solution is needed ahead of time. Just some mechanism to slow things down, buy us time to reflectively contemplate. I’m much less worried about gradual cultural changes over centuries than I am over a sudden change over a few years.
Human brains are very slow. Centuries of people running at 1000x speedup pass in months. The rest of us could get the option of uploading only after enough research is done, with AGI society values possibly having drifted far. (Fast takeoff, that is getting a superintelligence so quickly there is no time for AGI value drift, could resolve the issue. But then its alignment might be more difficult.)
I’m much less worried about gradual cultural changes over centuries than I am over a sudden change over a few years.
But why is that distinction important? The Future is astronomically longer. A bad change that slowly overtakes the world is still bad, dread instead of panic.
Yes, I quite agree that a slow inevitable change is just about as bad as a quick inevitable change. But a slow change which can be intervened against and halted is much less bad than a fast change which could theoretically be intervened against but you likely would miss the chance.
Like, if someone were to strap a bomb to me and say, “This will go off in X minutes” I’d much rather that the X be thousands of minutes rather than 5. Having thousands of minutes to defuse the bomb is a much better scenario for me.
Value drift is the kind of thing that naturally happens gradually and in an unclear way. It’s hard to intervene against it without novel coordination tech/institutions, especially if it leaves people unworried and tech/instututions remain undeveloped.
This seems very similar to not worrying about AGI because it’s believed to be far away, systematically not considering the consequences of whenever it arrives, not working on solutions as a result. And then suddenly starting to see what the consequences are when it’s getting closer, when it’s too late to develop solutions, or to put in place institutions that would stop its premature development. As if anything about the way it’s getting closer substantially informs the shape of the consequences and couldn’t be imagined well in advance. Except fire alarms for value drift might be even less well-defined than for AGI.
Value drift due to cultural evolution (Hanson’s cheerfully delivered horror stories) seems like a necessary issue to address, no less important than unintended misalignment through change of architecture. Humans are barely capable of noticing that this is a problem, so at some still-aligned level of advancement there needs to be sufficient comprehension and coordination to do something about this.
For example, I don’t expect good things by default from every human getting uploaded and then let loose on the Internet, even if synthetic AGIs are somehow rendered impossible. A natural equilibrium this settles into could be pretty bad, and humans won’t be able to design and enforce a better equilibrium before it’s too late.
Won’t the uploaded humans be able to do this? If you think the current world isn’t already on an irreversible course towards unacceptable value drift(such that a singleton is needed to repair things) I don’t see how the mass upload scenario is much worse, since any agreement we could negotiate to mitigate drift, the uploads could make too. The uploads would now have the ability to copy themselves and run at different speeds, but that doesn’t seem to obviously make coordination much harder.
The problem is that willingness to change gives power, leaving those most concerned with current value drift behind, hence “evolution”. With enough change, original stakeholders become disempowered, and additionally interventions that suffice to stop or reverse this process become more costly.
So interventions are more plausible to succeed if done in advance, before the current equilibrium is unmoored. Which requires apparently superhuman foresight.
Yes, although I don’t think a full solution is needed ahead of time. Just some mechanism to slow things down, buy us time to reflectively contemplate. I’m much less worried about gradual cultural changes over centuries than I am over a sudden change over a few years.
Human brains are very slow. Centuries of people running at 1000x speedup pass in months. The rest of us could get the option of uploading only after enough research is done, with AGI society values possibly having drifted far. (Fast takeoff, that is getting a superintelligence so quickly there is no time for AGI value drift, could resolve the issue. But then its alignment might be more difficult.)
But why is that distinction important? The Future is astronomically longer. A bad change that slowly overtakes the world is still bad, dread instead of panic.
Yes, I quite agree that a slow inevitable change is just about as bad as a quick inevitable change. But a slow change which can be intervened against and halted is much less bad than a fast change which could theoretically be intervened against but you likely would miss the chance.
Like, if someone were to strap a bomb to me and say, “This will go off in X minutes” I’d much rather that the X be thousands of minutes rather than 5. Having thousands of minutes to defuse the bomb is a much better scenario for me.
Value drift is the kind of thing that naturally happens gradually and in an unclear way. It’s hard to intervene against it without novel coordination tech/institutions, especially if it leaves people unworried and tech/instututions remain undeveloped.
This seems very similar to not worrying about AGI because it’s believed to be far away, systematically not considering the consequences of whenever it arrives, not working on solutions as a result. And then suddenly starting to see what the consequences are when it’s getting closer, when it’s too late to develop solutions, or to put in place institutions that would stop its premature development. As if anything about the way it’s getting closer substantially informs the shape of the consequences and couldn’t be imagined well in advance. Except fire alarms for value drift might be even less well-defined than for AGI.