I don’t understand precisely what question you’re asking. I think it’s unlikely we will happen to solve alignment by any method in the time frame between an AGI going substantially superhuman and the AGI causing doom.
I think we get to to the bottom of the disagreement. You think that an AGI would be capable of killing humans in days or weeks, and I think it wouldn’t. As I think it would take at least months (but more likely years) for an AGI to go to a position where it can kill humans, I think it is possible to make other AGIs in the meantime and coerce some of them into solving the alignment problem/fighting rogue AGIs.
So now, we can discuss about why I think it would take years rather than days. My model of the world is one where you CAN cause a great harm in a short amount of time, but I don’t think it is possible and I haven’t seen any evidence so far, that we live in a world where an entity with bounded computational capabilities successfully implements a plan that kills all humans without incurring into great risks to itself. I am sorry I can’t give more details but I cannot really prove a negative. I can only come up with examples like: if you said to me that you have a plan to make Chris Rock go have a threesome with Will Smith and Donald Trump, I wouldn’t tell you it is physically impossible, but I would be automatically skeptical.
Even if it takes years, the “make another AGI to fight them” step would… require solving the alignment problem? So it would just give us some more time, and probably not nearly enough time.
We could shut off the internet/all our computers during those years. That would work fine.
I see, why have a crux them. How quickly do you think an AGI would need to solve the alignment problem?
I am deducing that you think:
Time(alignment) > time(doom)
I don’t understand precisely what question you’re asking. I think it’s unlikely we will happen to solve alignment by any method in the time frame between an AGI going substantially superhuman and the AGI causing doom.
I think we get to to the bottom of the disagreement. You think that an AGI would be capable of killing humans in days or weeks, and I think it wouldn’t. As I think it would take at least months (but more likely years) for an AGI to go to a position where it can kill humans, I think it is possible to make other AGIs in the meantime and coerce some of them into solving the alignment problem/fighting rogue AGIs.
So now, we can discuss about why I think it would take years rather than days. My model of the world is one where you CAN cause a great harm in a short amount of time, but I don’t think it is possible and I haven’t seen any evidence so far, that we live in a world where an entity with bounded computational capabilities successfully implements a plan that kills all humans without incurring into great risks to itself. I am sorry I can’t give more details but I cannot really prove a negative. I can only come up with examples like: if you said to me that you have a plan to make Chris Rock go have a threesome with Will Smith and Donald Trump, I wouldn’t tell you it is physically impossible, but I would be automatically skeptical.
Even if it takes years, the “make another AGI to fight them” step would… require solving the alignment problem? So it would just give us some more time, and probably not nearly enough time.
We could shut off the internet/all our computers during those years. That would work fine.