The mathematical property that you’re looking for is independence. In particular, your computation of 1 - .95**60 would be valid if the probability of failure in one month is independent of the probability of failure in any other month.
I don’t think aleatoric risk is necessary. Consider an ML system that was magically trained to maximize CEV (or whatever you think would make it aligned), but it is still vulnerable to adversarial examples. Suppose that adversarial example questions form 1% of the space of possible questions that I could ask. (This is far too high, but whatever.) It’s likely roughly true that two different questions that I ask have independent probabilities of being adversarial examples, since I have no clue what the space of adversarial examples looks like. So the probability of failure compounds in the number of questions I ask.
Personally, I still put a lot of weight on models where the kind of advanced AI systems we’re likely to build are not dangerous by default, but carry some ~constant risk of becoming dangerous for every second they are turned on (e.g. by breaking out of a box, having critical insights about the world, instantiating inner optimizers, etc.).
In this case I think you should estimate the probability of the AI system ever becoming dangerous (bearing in mind how long it will be operating), not the probability per second. I expect much better intuitions for the former.
The mathematical property that you’re looking for is independence. In particular, your computation of 1 - .95**60 would be valid if the probability of failure in one month is independent of the probability of failure in any other month.
I don’t think aleatoric risk is necessary. Consider an ML system that was magically trained to maximize CEV (or whatever you think would make it aligned), but it is still vulnerable to adversarial examples. Suppose that adversarial example questions form 1% of the space of possible questions that I could ask. (This is far too high, but whatever.) It’s likely roughly true that two different questions that I ask have independent probabilities of being adversarial examples, since I have no clue what the space of adversarial examples looks like. So the probability of failure compounds in the number of questions I ask.
In this case I think you should estimate the probability of the AI system ever becoming dangerous (bearing in mind how long it will be operating), not the probability per second. I expect much better intuitions for the former.
1) Yep, independence.
2) Seems right as well.
3) I think it’s important to consider “risk per second”, because
(i) I think many AI systems could eventually become dangerous, just not on reasonable time-scales.
(ii) I think we might want to run AI systems which have the potential to become dangerous for limited periods of time.
(iii) If most of the risk is far in the future, we can hope to become more prepared in the meanwhile