Yeah, I think “no control over future, 50% you die” is like 70% as alarming as “no control over the future, 90% you die.” Even if it was only 50% as concerning, all of these differences seem tiny in practice compared to other sources of variation in “do people really believe this could happen?” or other inputs into decision-making. I think it’s correct to summarize as “practically as alarming.”
I’m not sure what you want engagement with. I don’t think the much worse outcomes are closely related to unaligned AI so I don’t think they seem super relevant to my comment or Nate’s post. Similarly for lots of other reasons the future could be scary or disorienting. I do explicitly flag the loss of control over the future in that same sentence. I think the 50% chance of death is probably in the right ballpark from the perspective of selfish concern about misalignment.
Note that the 50% probability of death includes the possibility of AI having preferences about humans incompatible with our survival. I think the selection pressure for things like spite is radically weaker for the kinds of AI systems produced by ML than for humans (for simple reasons—where is the upside to the AI from spite during training? seems like if you get stuff like threats it will primarily be instrumental rather than a learned instinct) but didn’t really want to get into that in the post.
I do explicitly flag the loss of control over the future in that same sentence.
In your initial comment you talked a lot about AI respecting the preferences of weak agents (using 1/trillion of its resources) which implies handing back control of a lot of resources to humans, which from the selfish or scope insensitive perspective of typical humans probably seems almost as good as not losing that control in the first place.
I don’t think the much worse outcomes are closely related to unaligned AI so I don’t think they seem super relevant to my comment or Nate’s post.
If people think that (conditional on unaligned AI) in 50% of worlds everyone dies and the other 50% of worlds typically look like small utopias where existing humans get to live out long and happy lives (because of 1/trillion kindness), then they’re naturally going to think that aligned AI can only be better than that. So even if s-risks apply almost equally to both aligned and unaligned AI, I still want people to talk about it when talking about unaligned AIs, or take some other measure to ensure that people aren’t potentially misled like this.
(It could be that I’m just worrying too much here, that empirically people who read your top-level comment won’t get the impression that close to 50% of worlds with unaligned AIs will look like small utopias. If this is what you think, I guess we could try to find out, or just leave the discussion here.)
where is the upside to the AI from spite during training?
Maybe the AI develops it naturally from multi-agent training (intended to make the AI more competitive in the real world) or the AI developer tried to train some kind of morality (e.g. sense of fairness or justice) into the AI.
I think “50% you die” is more motivating to people than “90% you die” because in the former, people are likely to be able to increase the absolute chance of survival more, because at 90%, extinction is overdetermined.
Yeah, I think “no control over future, 50% you die” is like 70% as alarming as “no control over the future, 90% you die.” Even if it was only 50% as concerning, all of these differences seem tiny in practice compared to other sources of variation in “do people really believe this could happen?” or other inputs into decision-making. I think it’s correct to summarize as “practically as alarming.”
I’m not sure what you want engagement with. I don’t think the much worse outcomes are closely related to unaligned AI so I don’t think they seem super relevant to my comment or Nate’s post. Similarly for lots of other reasons the future could be scary or disorienting. I do explicitly flag the loss of control over the future in that same sentence. I think the 50% chance of death is probably in the right ballpark from the perspective of selfish concern about misalignment.
Note that the 50% probability of death includes the possibility of AI having preferences about humans incompatible with our survival. I think the selection pressure for things like spite is radically weaker for the kinds of AI systems produced by ML than for humans (for simple reasons—where is the upside to the AI from spite during training? seems like if you get stuff like threats it will primarily be instrumental rather than a learned instinct) but didn’t really want to get into that in the post.
In your initial comment you talked a lot about AI respecting the preferences of weak agents (using 1/trillion of its resources) which implies handing back control of a lot of resources to humans, which from the selfish or scope insensitive perspective of typical humans probably seems almost as good as not losing that control in the first place.
If people think that (conditional on unaligned AI) in 50% of worlds everyone dies and the other 50% of worlds typically look like small utopias where existing humans get to live out long and happy lives (because of 1/trillion kindness), then they’re naturally going to think that aligned AI can only be better than that. So even if s-risks apply almost equally to both aligned and unaligned AI, I still want people to talk about it when talking about unaligned AIs, or take some other measure to ensure that people aren’t potentially misled like this.
(It could be that I’m just worrying too much here, that empirically people who read your top-level comment won’t get the impression that close to 50% of worlds with unaligned AIs will look like small utopias. If this is what you think, I guess we could try to find out, or just leave the discussion here.)
Maybe the AI develops it naturally from multi-agent training (intended to make the AI more competitive in the real world) or the AI developer tried to train some kind of morality (e.g. sense of fairness or justice) into the AI.
I think “50% you die” is more motivating to people than “90% you die” because in the former, people are likely to be able to increase the absolute chance of survival more, because at 90%, extinction is overdetermined.