It’s lots of saving throws, you know? And you multiply the saving throws together and things look better. And they interact better than that because– well, in one way worse because it’s correlated: If you’re incompetent, you’re more likely to fail to solve the problem and more likely to fail to coordinate not to destroy the world. In some other sense, it’s better than interacting multiplicatively because weakness in one area compensates for strength in the other. I think there are a bunch of saving throws that could independently make things good, but then in reality you have to have a little bit here and a little bit here and a little bit here, if that makes sense.
I don’t understand this part. Translating to math, I think it’s saying something like, if pi is the probability that saving throw i works, then the probability that at least one of them works is 1−∏i(1−pi) (assuming the saving throws are independent), which is higher the more saving throws there are; but due to correlation, the saving throws are not independent, so we effectively have fewer saving throws. I don’t understand what “weakness in one area compensates for strength in the other” or “a little bit here and a little bit here and a little bit here” mean.
Correlations don’t necessarily raise or lower the joint probability of several events. Suppose there are two events:
We build AGI
We align AGI
and both are monotone functions of another variable, our competence. Then if we’re not competent enough to align AGI then maybe we’re also not competent enough to build AGI at all so there is no problem. Here the events correlate in a helpful way. This example illustrates what I think Paul means by “weakness in one area compensates for strength in the other”.
Your model could also be that there are two events:
We align AGI
We coordinate to not build AGI if we can’t solve 1.
Here if we’re not competent enough to solve 1 that’s some evidence that we won’t be competent enough to solve 2. So the events correlate in an unhelpful way.
I think the need for a nuanced view of the correlations between the events in your model of the future is what Paul means when he says “a little bit here and a little bit here and a little bit here”.
Maybe you have a 30% chance of solving the clean theoretical problem. And a 30% chance that you could wing AI alignment with no technical solution. If they were independent, you would have a 50% probability of being able to do one or the other.
But things are worse than this, because both of them are more likely to work if alignment turns out to be easy. So maybe it’s more like a 40% probability of being able to do one or the other.
But in reality, you don’t need to solve the full theoretical problem or wing the problem without understanding anything more than we do today. You can have a much better theoretical understanding than we currently do, but not good enough to solve the problem. And you can be pretty prepared to wing it, even if it’s not good enough to solve the problem without knowing anything it might be good enough if combined with a reasonable theoretical picture.
I don’t understand this part. Translating to math, I think it’s saying something like, if pi is the probability that saving throw i works, then the probability that at least one of them works is 1−∏i(1−pi) (assuming the saving throws are independent), which is higher the more saving throws there are; but due to correlation, the saving throws are not independent, so we effectively have fewer saving throws. I don’t understand what “weakness in one area compensates for strength in the other” or “a little bit here and a little bit here and a little bit here” mean.
Correlations don’t necessarily raise or lower the joint probability of several events. Suppose there are two events:
We build AGI
We align AGI
and both are monotone functions of another variable, our competence. Then if we’re not competent enough to align AGI then maybe we’re also not competent enough to build AGI at all so there is no problem. Here the events correlate in a helpful way. This example illustrates what I think Paul means by “weakness in one area compensates for strength in the other”.
Your model could also be that there are two events:
We align AGI
We coordinate to not build AGI if we can’t solve 1.
Here if we’re not competent enough to solve 1 that’s some evidence that we won’t be competent enough to solve 2. So the events correlate in an unhelpful way.
I think the need for a nuanced view of the correlations between the events in your model of the future is what Paul means when he says “a little bit here and a little bit here and a little bit here”.
Maybe you have a 30% chance of solving the clean theoretical problem. And a 30% chance that you could wing AI alignment with no technical solution. If they were independent, you would have a 50% probability of being able to do one or the other.
But things are worse than this, because both of them are more likely to work if alignment turns out to be easy. So maybe it’s more like a 40% probability of being able to do one or the other.
But in reality, you don’t need to solve the full theoretical problem or wing the problem without understanding anything more than we do today. You can have a much better theoretical understanding than we currently do, but not good enough to solve the problem. And you can be pretty prepared to wing it, even if it’s not good enough to solve the problem without knowing anything it might be good enough if combined with a reasonable theoretical picture.
(Similarly for coordination.)