Well to flesh that out , we could have an ASI that seems valye aligned and controllable...until it isn’t.
Or the sociap effects (deep fakes for example) cpuld ruin the world or land us in a dystopia well before actual AGI.
But that might be a bit orthagonal and in the weeds (specific examples of how we end up with x-risk or s-risk end scenarios without the attributing magic powers to the ASI)
Well to flesh that out , we could have an ASI that seems valye aligned and controllable...until it isn’t.
I think that scenario falls under the “worlds where iterative approaches fail” bucket, at least if prior to that we had a bunch of examples of AGIs that seemed and were value aligned and controllable, and the misalignment only showed up in the superhuman domain.
There is a different failure mode, which is “we see a bunch of cases of deceptive alignment in sub-human-capability AIs causing minor to moderate disasters, and we keep scaling up despite those disasters”. But that’s not so much “iterative approaches cannot work” as “iterative approaches do not work if you don’t learn from your mistakes”.
It’s not a moot point, because a lot of the difficulty of the problem as stated here is the “iterative approaches cannot work” bit.
Well to flesh that out , we could have an ASI that seems valye aligned and controllable...until it isn’t.
Or the sociap effects (deep fakes for example) cpuld ruin the world or land us in a dystopia well before actual AGI.
But that might be a bit orthagonal and in the weeds (specific examples of how we end up with x-risk or s-risk end scenarios without the attributing magic powers to the ASI)
I think that scenario falls under the “worlds where iterative approaches fail” bucket, at least if prior to that we had a bunch of examples of AGIs that seemed and were value aligned and controllable, and the misalignment only showed up in the superhuman domain.
There is a different failure mode, which is “we see a bunch of cases of deceptive alignment in sub-human-capability AIs causing minor to moderate disasters, and we keep scaling up despite those disasters”. But that’s not so much “iterative approaches cannot work” as “iterative approaches do not work if you don’t learn from your mistakes”.