Note, to be entirely clear, I’m not saying that this is anywhere near sufficient to align an AGI completely. Mostly it’s just a mechanism of decreasing the chance of totally catastrophic misalignment, and encouraging it to be just really really destructive instead. I don’t think curiosity alone is enough to prevent wreaking havoc, but I think it would lead to fitting the technical definition of alignment, which is that at least one human remains alive.
Note, to be entirely clear, I’m not saying that this is anywhere near sufficient to align an AGI completely. Mostly it’s just a mechanism of decreasing the chance of totally catastrophic misalignment, and encouraging it to be just really really destructive instead. I don’t think curiosity alone is enough to prevent wreaking havoc, but I think it would lead to fitting the technical definition of alignment, which is that at least one human remains alive.