Note that even in the extreme case where our approach to AI alignment would be completely different for different values of some unknown details, the speedup from knowing them in advance is at most 1/(probability of most likely possibility).
Shouldn’t this be ‘probability of least likely possibility’?
Suppose we have an unknown detail X with n possible values. What fraction of the total effort should we spend on the approach for X=i? I guess Pr(X=i) if we work on all approaches in parallel. For example, if value x2 has probability 13, we should spend 13 of the total effort on the approach for value x2.
What happens when we find out the true value of X? Let the true value be x∗. Then we can concentrate all the effort on the approach for x∗. Whereas previously the fraction of the effort for value x∗ was Pr(X=x∗), it’s now 1. Therefore the speedup is 1Pr(X=x∗). When is this greatest? When Pr(X=x∗) is least.
I’m imagining the first marginal unit of effort, which you’d apply to the most likely possibility. Its expected impact is reduced by that highest probability.
If you get unlucky, then your actual impact might be radically lower than if you had known what to work on.
Thanks for the clarification! Now I understand where the value comes from.
However, in such a situation, would we only work on the most likely possibility? I agree that a single person does better by concentrating. But a group of researchers would work on each of the more likely approaches in order to mitigate risk.
Shouldn’t this be ‘probability of least likely possibility’?
Suppose we have an unknown detail X with n possible values. What fraction of the total effort should we spend on the approach for X=i? I guess Pr(X=i) if we work on all approaches in parallel. For example, if value x2 has probability 13, we should spend 13 of the total effort on the approach for value x2.
What happens when we find out the true value of X? Let the true value be x∗. Then we can concentrate all the effort on the approach for x∗. Whereas previously the fraction of the effort for value x∗ was Pr(X=x∗), it’s now 1. Therefore the speedup is 1Pr(X=x∗). When is this greatest? When Pr(X=x∗) is least.
I’m imagining the first marginal unit of effort, which you’d apply to the most likely possibility. Its expected impact is reduced by that highest probability.
If you get unlucky, then your actual impact might be radically lower than if you had known what to work on.
Thanks for the clarification! Now I understand where the value comes from.
However, in such a situation, would we only work on the most likely possibility? I agree that a single person does better by concentrating. But a group of researchers would work on each of the more likely approaches in order to mitigate risk.