I don’t think that’s necessarily true. There’s two ways in which I think it can compound:
if the AGI will self-upgrade, or design more advanced AGI, the problem repeats, and the AGI can make mistakes, same as us, though probably less obvious mistakes
it is possible to imagine an AGI that stays generally aligned but has a certain probability of being triggered on some runaway loop in which it loses its alignment. Like it will come up with pretty aligned solutions most of the time but there is something, some kind of problem or situation, that is so out-of-domain it sends it off the path of insanity, and it’s unrecoverable, and we don’t know how or when that might occur.
Also, it might simply be probabilistic—any non-fully deterministic AGI probably wouldn’t literally have no access to non-aligned strategies, but merely assign them very small logits. So in theory that’s still a finite but non-zero possibility that it goes into some kind of “kill all humans” strategy path. And even if you interpret this as one-shot (did you align it right or not on creation?), the effects might not be visible right away.
I don’t think that’s necessarily true. There’s two ways in which I think it can compound:
if the AGI will self-upgrade, or design more advanced AGI, the problem repeats, and the AGI can make mistakes, same as us, though probably less obvious mistakes
it is possible to imagine an AGI that stays generally aligned but has a certain probability of being triggered on some runaway loop in which it loses its alignment. Like it will come up with pretty aligned solutions most of the time but there is something, some kind of problem or situation, that is so out-of-domain it sends it off the path of insanity, and it’s unrecoverable, and we don’t know how or when that might occur.
Also, it might simply be probabilistic—any non-fully deterministic AGI probably wouldn’t literally have no access to non-aligned strategies, but merely assign them very small logits. So in theory that’s still a finite but non-zero possibility that it goes into some kind of “kill all humans” strategy path. And even if you interpret this as one-shot (did you align it right or not on creation?), the effects might not be visible right away.