I can think of a few reasons someone might think AI Control research should receive very high priority, apart from what is mentioned in the post or in Buck’s comment:
You hope/expect early transformative AI to be used for provable safety approaches, using formal verification methods.
You think AI control research is more tractable than other research agendas, or will have useful results faster, before they are too late to apply.
Our only chance of aligning a superintelligence is to delegate the problem to AIs, either because it is too hard for humans, or it will arrive sooner than the proper alignment techniques can feasibly be developed.
You expect a significant fraction of total AI safety research over all time to be done by early transformative AI, so control research has high leverage value in improving the probability of successfully getting the AI to do valuable safety research, even if slop is quite likely.
I agree with basically everything in the post but put enough probability on these points to think that control research has really high expected value anyway.
Even if all of those are true, the argument in the post would still imply that control research (at least of the sort people do today) cannot have very high expected value. Like, sure, let’s assume for sake of discussion that most total AI safety research will be done by early transformative AI, that the only chance of aligning superintelligent AIs is to delegate, that control research is unusually tractable, and that for some reason we’re going to use the AIs to pursue formal verification (not a good idea, but whatever).
Even if we assume all that, we still have the problem that control research of the sort people do today does basically-nothing to address slop; it is basically-exclusively focused on intentional scheming. Insofar as intentional scheming is not the main thing which makes outsourcing to early AIs fail, all that control research cannot have very high expected value. None of your bullet points address that core argument at all.
I think my points argue more that control research might have higher expected value than some other approaches, that don’t address delegation at all or are much less tractable. But I agree, if slop is the major problem, then most current control research doesn’t adress it, though it’s nice to see that this might change if Buck is right.
And my point about formal verification was to work around the slop problem by verifying the safety approach to a high degree of certainty. I don’t know if it’s feasible, though, but some seem to think so. Why do you think it’s a bad idea?
I can think of a few reasons someone might think AI Control research should receive very high priority, apart from what is mentioned in the post or in Buck’s comment:
You hope/expect early transformative AI to be used for provable safety approaches, using formal verification methods.
You think AI control research is more tractable than other research agendas, or will have useful results faster, before they are too late to apply.
Our only chance of aligning a superintelligence is to delegate the problem to AIs, either because it is too hard for humans, or it will arrive sooner than the proper alignment techniques can feasibly be developed.
You expect a significant fraction of total AI safety research over all time to be done by early transformative AI, so control research has high leverage value in improving the probability of successfully getting the AI to do valuable safety research, even if slop is quite likely.
I agree with basically everything in the post but put enough probability on these points to think that control research has really high expected value anyway.
Even if all of those are true, the argument in the post would still imply that control research (at least of the sort people do today) cannot have very high expected value. Like, sure, let’s assume for sake of discussion that most total AI safety research will be done by early transformative AI, that the only chance of aligning superintelligent AIs is to delegate, that control research is unusually tractable, and that for some reason we’re going to use the AIs to pursue formal verification (not a good idea, but whatever).
Even if we assume all that, we still have the problem that control research of the sort people do today does basically-nothing to address slop; it is basically-exclusively focused on intentional scheming. Insofar as intentional scheming is not the main thing which makes outsourcing to early AIs fail, all that control research cannot have very high expected value. None of your bullet points address that core argument at all.
I think my points argue more that control research might have higher expected value than some other approaches, that don’t address delegation at all or are much less tractable. But I agree, if slop is the major problem, then most current control research doesn’t adress it, though it’s nice to see that this might change if Buck is right.
And my point about formal verification was to work around the slop problem by verifying the safety approach to a high degree of certainty. I don’t know if it’s feasible, though, but some seem to think so. Why do you think it’s a bad idea?