Interesting idea, but it seems risky. Would life be the only, or for that matter, even the primary, complex system that such an AI would avoid interfering with?
Further, it seems likely that a curiosity-based AI might intentionally create or seek out complexity, which could be risky. Think of how kids love to say “I want to go to the moon!” “I want to go to every country in the world!”. I mean, I do too and I’m an adult. Surely a curiosity-based AI would attempt to go to fairly extreme limits for the sake of satiating its own curiosity, at the expense of other values.
Maybe such an AGI could have like… an allowance? “Never spend more than 1% of your resources on a single project” or something? But I have absolutely no idea how you could define a consistent idea of a “single project”.
Note, to be entirely clear, I’m not saying that this is anywhere near sufficient to align an AGI completely. Mostly it’s just a mechanism of decreasing the chance of totally catastrophic misalignment, and encouraging it to be just really really destructive instead. I don’t think curiosity alone is enough to prevent wreaking havoc, but I think it would lead to fitting the technical definition of alignment, which is that at least one human remains alive.
Interesting idea, but it seems risky.
Would life be the only, or for that matter, even the primary, complex system that such an AI would avoid interfering with?
Further, it seems likely that a curiosity-based AI might intentionally create or seek out complexity, which could be risky.
Think of how kids love to say “I want to go to the moon!” “I want to go to every country in the world!”. I mean, I do too and I’m an adult. Surely a curiosity-based AI would attempt to go to fairly extreme limits for the sake of satiating its own curiosity, at the expense of other values.
Maybe such an AGI could have like… an allowance? “Never spend more than 1% of your resources on a single project” or something? But I have absolutely no idea how you could define a consistent idea of a “single project”.
Note, to be entirely clear, I’m not saying that this is anywhere near sufficient to align an AGI completely. Mostly it’s just a mechanism of decreasing the chance of totally catastrophic misalignment, and encouraging it to be just really really destructive instead. I don’t think curiosity alone is enough to prevent wreaking havoc, but I think it would lead to fitting the technical definition of alignment, which is that at least one human remains alive.