I’ll probably create a post soon-ish with more visualizations covering cases like the ones you suggested.
You’re right about the model being pertinent to cases where we’ve already solved the alignment problem pretty well, but want to try other safety measures. I’m particularly thinking about cases where the AIs are so advanced that humans can’t really supervise them well, so the AIs must supervise each other. In that case, I’m not sure how p would behave as a function of AI capability. Maybe it’s best to assume that p is increasing with capability, just so we’re aware of what the worst case could be?
Ah thanks for pointing out the typo.
I’ll probably create a post soon-ish with more visualizations covering cases like the ones you suggested.
You’re right about the model being pertinent to cases where we’ve already solved the alignment problem pretty well, but want to try other safety measures. I’m particularly thinking about cases where the AIs are so advanced that humans can’t really supervise them well, so the AIs must supervise each other. In that case, I’m not sure how p would behave as a function of AI capability. Maybe it’s best to assume that p is increasing with capability, just so we’re aware of what the worst case could be?