What are the standard doomy “lol no” responses to “Any AGI will have a smart enough decision theory to not destroy the intelligence that created it (ie. us), because we’re only willing to build AGI that won’t kill us”?
(I suppose it isn’t necessary to give a strong reason why acausality will show up in AGI decision theory, but one good one is that it has to be smart enough to cooperate with itself.)
Some responses that I can think of (but I can also counter, with varying success):
A. Humanity is racing to build an AGI anyway, this “decision” is not really enough in our control to exert substantial acausal influence
B. It might not destroy us, but it will likely permanently lock away our astronomical endowment and this is basically just as bad and therefore the argument is mostly irrelevant
C. We don’t particularly care to preserve what our genes may “wish” to preserve, not even the acausal-pilled among us
D. Runaway, reckless consequntialism is likely to emerge long before a sophisticated decision theory that incorporates human values/agency, and so if there is such a trade to be had, it will likely be already too late
E. There is nothing natural about the categories carved up for this “trade” and so we wouldn’t expect it to take place. If we can’t even tell it what a diamond is, we certainly wouldn’t share enough context for this particular acausal trade to snap into place
F. The correct decision theory will actually turn out to only one-box in Newcomb’s and not in Transparent Newcomb’s, and this is Transparent Newcomb’s
G.There will be no “agent” or “decision theory” to speak of, we just go out with a whimper via increasingly lowered fidelity to values in the machines we end up designing
This is from ten minutes of brainstorming, I’m sure it misses out some important ones. Obviously, if there don’t exist any good ones (ones without counters), that gives us reason to beieve in alignment by default!
What are the standard doomy “lol no” responses to “Any AGI will have a smart enough decision theory to not destroy the intelligence that created it (ie. us), because we’re only willing to build AGI that won’t kill us”?
(I suppose it isn’t necessary to give a strong reason why acausality will show up in AGI decision theory, but one good one is that it has to be smart enough to cooperate with itself.)
Some responses that I can think of (but I can also counter, with varying success):
A. Humanity is racing to build an AGI anyway, this “decision” is not really enough in our control to exert substantial acausal influence
B. It might not destroy us, but it will likely permanently lock away our astronomical endowment and this is basically just as bad and therefore the argument is mostly irrelevant
C. We don’t particularly care to preserve what our genes may “wish” to preserve, not even the acausal-pilled among us
D. Runaway, reckless consequntialism is likely to emerge long before a sophisticated decision theory that incorporates human values/agency, and so if there is such a trade to be had, it will likely be already too late
E. There is nothing natural about the categories carved up for this “trade” and so we wouldn’t expect it to take place. If we can’t even tell it what a diamond is, we certainly wouldn’t share enough context for this particular acausal trade to snap into place
F. The correct decision theory will actually turn out to only one-box in Newcomb’s and not in Transparent Newcomb’s, and this is Transparent Newcomb’s
G.There will be no “agent” or “decision theory” to speak of, we just go out with a whimper via increasingly lowered fidelity to values in the machines we end up designing
This is from ten minutes of brainstorming, I’m sure it misses out some important ones. Obviously, if there don’t exist any good ones (ones without counters), that gives us reason to beieve in alignment by default!
Keen to hear your responses.
Here’s a more detailed writeup about this: https://www.lesswrong.com/posts/rP66bz34crvDudzcJ/decision-theory-does-not-imply-that-we-get-to-have-nice