I recently came back to this post because I remembered it having examples of what influence-seeking agents might look like, and wanted to quote them. But now that I’m rereading in detail, it’s all very vague. E.g.
A few automated systems go off the rails in response to some local shock. As those systems go off the rails, the local shock is compounded into a larger disturbance; more and more automated systems move further from their training distribution and start failing.
This doesn’t constrain my expectations about what the automated systems are doing in any way; nor does it distinguish between recoverable and irrecoverable shocks. Is AI control over militaries necessary for a correlated automation failure to be irrecoverable? Or control over basic infrastructure? How well do AIs need to cooperate with each other to prevent humans from targeting them individually?
Overall I’m downgrading my credence in this scenario.
I recently came back to this post because I remembered it having examples of what influence-seeking agents might look like, and wanted to quote them. But now that I’m rereading in detail, it’s all very vague. E.g.
This doesn’t constrain my expectations about what the automated systems are doing in any way; nor does it distinguish between recoverable and irrecoverable shocks. Is AI control over militaries necessary for a correlated automation failure to be irrecoverable? Or control over basic infrastructure? How well do AIs need to cooperate with each other to prevent humans from targeting them individually?
Overall I’m downgrading my credence in this scenario.