I think that by the time we have a moderately-superhuman agentive unaligned AGI, we’re in deep trouble. I think it’s more interesting and useful to stay focused on the time leading up to that which we must cross through to get to the deep trouble.
Particularly, I am hopeful that there will be some level of sub-human AGI (either sub-human intelligence or sub-human-speed) which tries some of the things we predict might happen, like a deceptive turn, but underestimates us, and we catch it in time. Or we notice the first crazy weird bad thing happen, and immediately shut down the data centers, because the model had mistimed how quickly we’d be able to do that.
Perhaps one of the things the AI safety governance people should work on is setting up an ‘off switch’ for cloud computing, and a set of clear guidelines for when to use it, just in case we do get that window of opportunity.
I think that by the time we have a moderately-superhuman agentive unaligned AGI, we’re in deep trouble. I think it’s more interesting and useful to stay focused on the time leading up to that which we must cross through to get to the deep trouble.
Particularly, I am hopeful that there will be some level of sub-human AGI (either sub-human intelligence or sub-human-speed) which tries some of the things we predict might happen, like a deceptive turn, but underestimates us, and we catch it in time. Or we notice the first crazy weird bad thing happen, and immediately shut down the data centers, because the model had mistimed how quickly we’d be able to do that.
Perhaps one of the things the AI safety governance people should work on is setting up an ‘off switch’ for cloud computing, and a set of clear guidelines for when to use it, just in case we do get that window of opportunity.