if some kind of compassionate-LDT is a source of hope about not destroying all the value in our universe-share and getting ourselves killed, then it must be hope about us figuring out such a theory and selecting for AGIs that implement it from the start, rather than that maybe an AGI would likely convergently become that way before taking over the world.
I weakly disagree here, mainly because Nate’s argument for very high levels of risk goes through strong generalization/a “sharp left turn” towards being much more coherent + goal-directed. So I find it hard to evaluate whether, if LDT does converge towards compassion, the sharp left turn would get far enough to reach it (although the fact that humans are fairly close to having universe-remaking power without having any form of compassionate LDT is of course a strong argument weighing the other way).
(Also FWIW I feel very skeptical of the “compassionate moral realism” book, based on your link.)
I’m confused by the claim that humans do not have compassionate LDT. It seems to me that a great many humans learn significant approximation to compassionate LDT. however it doesn’t seem to be built in by default and it probably mostly comes from the training data.
I weakly disagree here, mainly because Nate’s argument for very high levels of risk goes through strong generalization/a “sharp left turn” towards being much more coherent + goal-directed. So I find it hard to evaluate whether, if LDT does converge towards compassion, the sharp left turn would get far enough to reach it (although the fact that humans are fairly close to having universe-remaking power without having any form of compassionate LDT is of course a strong argument weighing the other way).
(Also FWIW I feel very skeptical of the “compassionate moral realism” book, based on your link.)
I’m confused by the claim that humans do not have compassionate LDT. It seems to me that a great many humans learn significant approximation to compassionate LDT. however it doesn’t seem to be built in by default and it probably mostly comes from the training data.