That being said, from an orthogonality perspective, I don’t have any intuition (let alone reasoning) that says that this compassionate breed of LDT is necessary for any particular level of universe-remaking power, including the level needed for a decisive strategic advantage over the rest of Earth’s biosphere. If being a compassionate-LDT agent confers advantages over standard-FDT agents from a Darwinian selection perspective, it would have to be via group selection, but our default trajectory is to end up with a singleton, in which case standard-FDT might be reflectively stable. Perhaps eventually some causal or acausal interaction with non-earth-originating superintelligence would prompt a shift, but, as Nate says,
But that’s not us trading with the AI; that’s us destroying all of the value in our universe-shard and getting ourselves killed in the process, and then banking on the competence and compassion of aliens.
So, if some kind of compassionate-LDT is a source of hope about not destroying all the value in our universe-share and getting ourselves killed, then it must be hope about us figuring out such a theory and selecting for AGIs that implement it from the start, rather than that maybe an AGI would likely convergently become that way before taking over the world.
if some kind of compassionate-LDT is a source of hope about not destroying all the value in our universe-share and getting ourselves killed, then it must be hope about us figuring out such a theory and selecting for AGIs that implement it from the start, rather than that maybe an AGI would likely convergently become that way before taking over the world.
I weakly disagree here, mainly because Nate’s argument for very high levels of risk goes through strong generalization/a “sharp left turn” towards being much more coherent + goal-directed. So I find it hard to evaluate whether, if LDT does converge towards compassion, the sharp left turn would get far enough to reach it (although the fact that humans are fairly close to having universe-remaking power without having any form of compassionate LDT is of course a strong argument weighing the other way).
(Also FWIW I feel very skeptical of the “compassionate moral realism” book, based on your link.)
I’m confused by the claim that humans do not have compassionate LDT. It seems to me that a great many humans learn significant approximation to compassionate LDT. however it doesn’t seem to be built in by default and it probably mostly comes from the training data.
That being said, from an orthogonality perspective, I don’t have any intuition (let alone reasoning) that says that this compassionate breed of LDT is necessary for any particular level of universe-remaking power, including the level needed for a decisive strategic advantage over the rest of Earth’s biosphere. If being a compassionate-LDT agent confers advantages over standard-FDT agents from a Darwinian selection perspective, it would have to be via group selection, but our default trajectory is to end up with a singleton, in which case standard-FDT might be reflectively stable. Perhaps eventually some causal or acausal interaction with non-earth-originating superintelligence would prompt a shift, but, as Nate says,
So, if some kind of compassionate-LDT is a source of hope about not destroying all the value in our universe-share and getting ourselves killed, then it must be hope about us figuring out such a theory and selecting for AGIs that implement it from the start, rather than that maybe an AGI would likely convergently become that way before taking over the world.
I weakly disagree here, mainly because Nate’s argument for very high levels of risk goes through strong generalization/a “sharp left turn” towards being much more coherent + goal-directed. So I find it hard to evaluate whether, if LDT does converge towards compassion, the sharp left turn would get far enough to reach it (although the fact that humans are fairly close to having universe-remaking power without having any form of compassionate LDT is of course a strong argument weighing the other way).
(Also FWIW I feel very skeptical of the “compassionate moral realism” book, based on your link.)
I’m confused by the claim that humans do not have compassionate LDT. It seems to me that a great many humans learn significant approximation to compassionate LDT. however it doesn’t seem to be built in by default and it probably mostly comes from the training data.