Setting that aside, it reads to me like the frame-clash happening here is (loosely) between “50% extinction, 50% not-extinction” and “50% extinction, 50% utopia”
Yeah, I think this is a factor. Paul talked a lot about “1/trillion kindness” as the reason for non-extinction, but 1/trillion kindness seems to directly imply a small utopia where existing humans get to live out long and happy lives (even better/longer lives than without AI) so it seemed to me like he was (maybe unintentionally) giving the reader a frame of “50% extinction, 50% small utopia”, while still writing other things under the “50% extinction, 50% not-extinction” frame himself.
1/trillion kindness seems to directly imply a small utopia where existing humans get to live out long and happy lives
Not direct implication, because the AI might have other human-concerning preferences that are larger than 1/trillion. C.f. top-level comment: “I’m not talking about whether the AI has spite or other strong preferences that are incompatible with human survival, I’m engaging specifically with the claim that AI is likely to care so little one way or the other that it would prefer just use the humans for atoms.”
I’d guess “most humans survive” vs. “most humans die” probabilities don’t correspond super closely to “presence of small pseudo-kindness”. Because of how other preferences could outweigh that, and because cooperation/bargaining is a big reason for why humans might survive aside from intrinsic preferences.
I’d guess “most humans survive” vs. “most humans die” probabilities don’t correspond super closely to “presence of small pseudo-kindness”. Because of how other preferences could outweigh that, and because cooperation/bargaining is a big reason for why humans might survive aside from intrinsic preferences.
Yeah, I think that:
“AI doesn’t care about humans at all so kills them incidentally” is not most of the reason that AIs may kill humans, and my bottom line 50% probability of AI killing us also includes the other paths (AI caring a bit but failing to coordinate to avoid killing humans, conflict during takeover leading to killing lots of humans, AI having scope-sensitive preferences for which not killing humans is a meaningful cost, preserving humans being surprisingly costly, AI having preferences about humans like spite for which human survival is a cost...).
To the extent that its possible to distinguish “intrinsic pseudokindness” from decision-theoretic considerations leading to pseudokindness, I think that decision-theoretic considerations are more important. (I don’t have a strong view on relative importance of ECL and acausal trade, and I think these are hard to disentangle from fuzzier psychological considerations and it all tends to interact.)
AI having scope-sensitive preferences for which not killing humans is a meaningful cost
Could you say more what you mean? If the AI has no discount rate, leaving Earth to the humans may require within a few orders of magnitude 1/trillion kindness. However, if the AI does have a significant discount rate, then delays could be costly to it. Still, the AI could make much more progress in building a Dyson swarm from the moon/Mercury/asteroids with their lower gravity and no atmosphere, allowing the AI to launch material very quickly. My very rough estimate indicates sparing Earth might only delay the AI a month from taking over the universe. That could require a lot of kindness if they have very high discount rates. So maybe training should emphasize the superiority of low discount rates?
Sorry, I meant “scope-insensitive,” and really I just meant an even broader category of like “doesn’t care 10x as much about getting 10x as much stuff.” I think discount rates or any other terminal desire to move fast would count (though for options like “survive in an unpleasant environment for a while” or “freeze and revive later” the required levels of kindness may still be small).
(A month seems roughly right to me as the cost of not trashing Earth’s environment to the point of uninhabitability.)
Yeah, I think this is a factor. Paul talked a lot about “1/trillion kindness” as the reason for non-extinction, but 1/trillion kindness seems to directly imply a small utopia where existing humans get to live out long and happy lives (even better/longer lives than without AI) so it seemed to me like he was (maybe unintentionally) giving the reader a frame of “50% extinction, 50% small utopia”, while still writing other things under the “50% extinction, 50% not-extinction” frame himself.
Not direct implication, because the AI might have other human-concerning preferences that are larger than 1/trillion. C.f. top-level comment: “I’m not talking about whether the AI has spite or other strong preferences that are incompatible with human survival, I’m engaging specifically with the claim that AI is likely to care so little one way or the other that it would prefer just use the humans for atoms.”
I’d guess “most humans survive” vs. “most humans die” probabilities don’t correspond super closely to “presence of small pseudo-kindness”. Because of how other preferences could outweigh that, and because cooperation/bargaining is a big reason for why humans might survive aside from intrinsic preferences.
Yeah, I think that:
“AI doesn’t care about humans at all so kills them incidentally” is not most of the reason that AIs may kill humans, and my bottom line 50% probability of AI killing us also includes the other paths (AI caring a bit but failing to coordinate to avoid killing humans, conflict during takeover leading to killing lots of humans, AI having scope-sensitive preferences for which not killing humans is a meaningful cost, preserving humans being surprisingly costly, AI having preferences about humans like spite for which human survival is a cost...).
To the extent that its possible to distinguish “intrinsic pseudokindness” from decision-theoretic considerations leading to pseudokindness, I think that decision-theoretic considerations are more important. (I don’t have a strong view on relative importance of ECL and acausal trade, and I think these are hard to disentangle from fuzzier psychological considerations and it all tends to interact.)
Could you say more what you mean? If the AI has no discount rate, leaving Earth to the humans may require within a few orders of magnitude 1/trillion kindness. However, if the AI does have a significant discount rate, then delays could be costly to it. Still, the AI could make much more progress in building a Dyson swarm from the moon/Mercury/asteroids with their lower gravity and no atmosphere, allowing the AI to launch material very quickly. My very rough estimate indicates sparing Earth might only delay the AI a month from taking over the universe. That could require a lot of kindness if they have very high discount rates. So maybe training should emphasize the superiority of low discount rates?
Sorry, I meant “scope-insensitive,” and really I just meant an even broader category of like “doesn’t care 10x as much about getting 10x as much stuff.” I think discount rates or any other terminal desire to move fast would count (though for options like “survive in an unpleasant environment for a while” or “freeze and revive later” the required levels of kindness may still be small).
(A month seems roughly right to me as the cost of not trashing Earth’s environment to the point of uninhabitability.)