As one relevant consideration, I think the topic of “will AI kill all humans” is a question whose answer relies in substantial parts on TDT-ish considerations, and is something that a bunch of value systems I think reasonably care a lot about. Also I think what superintelligent systems will do will depend a lot on decision-theoretic considerations that seem very hard to answer from a CDT vs. EDT-ish frame.
Oh, I thought this was relatively straightforward and has been discussed a bunch. There are two lines of argument I know for why superintelligent AI, even if unaligned, might not literally kill everyone, but keep some humans alive:
The AI might care a tiny bit about our values, even if it mostly doesn’t share them
The AI might want to coordinate with other AI systems that reached superintelligence to jointly optimize the universe. So in a world where there is only a 1% chance that we align AI systems to our values, then even in unaligned worlds we might end up with AI systems that adopt our values as a 1% mixture in its utility function (and also consequently in those 1% of worlds, we might still want to trade away 99% of the universe to the values that the counterfactual AI systems would have had)
Some places where the second line of argument has been discussed:
Note that in this comment I’m not touching on acausal trade (with successful humans) or ECL. I think those are very relevant to whether AI systems kill everyone, but are less related to this implicit claim about kindness which comes across in your parables (since acausally trading AIs are basically analogous to the ants who don’t kill us because we have power).
As one relevant consideration, I think the topic of “will AI kill all humans” is a question whose answer relies in substantial parts on TDT-ish considerations, and is something that a bunch of value systems I think reasonably care a lot about. Also I think what superintelligent systems will do will depend a lot on decision-theoretic considerations that seem very hard to answer from a CDT vs. EDT-ish frame.
I think I speak for many when I ask you to please elaborate on this!
Oh, I thought this was relatively straightforward and has been discussed a bunch. There are two lines of argument I know for why superintelligent AI, even if unaligned, might not literally kill everyone, but keep some humans alive:
The AI might care a tiny bit about our values, even if it mostly doesn’t share them
The AI might want to coordinate with other AI systems that reached superintelligence to jointly optimize the universe. So in a world where there is only a 1% chance that we align AI systems to our values, then even in unaligned worlds we might end up with AI systems that adopt our values as a 1% mixture in its utility function (and also consequently in those 1% of worlds, we might still want to trade away 99% of the universe to the values that the counterfactual AI systems would have had)
Some places where the second line of argument has been discussed:
This comment by Ryan Greenblatt:[1] https://www.lesswrong.com/posts/tKk37BFkMzchtZThx/miri-2024-communications-strategy?commentId=xBYimQtgASti5tgWv
This comment by Paul Christiano:[2] https://www.lesswrong.com/posts/2NncxDQ3KBDCxiJiP/cosmopolitan-values-don-t-come-free?commentId=ofPTrG6wsq7CxuTXk
See also: https://www.lesswrong.com/posts/rP66bz34crvDudzcJ/decision-theory-does-not-imply-that-we-get-to-have-nice