Model 1. Your new paper produces c units of capabilities, and a units of alignment. When C units of capabilities are reached, an AI is produced, and it is aligned iff A units of alignment has been produced. The rest of the world produces, and will continue to produce, alignment and capabilities research in ratio R. You are highly uncertain about A and/or C, but have a good guess at a,c,R.
In this model, if AC<<R we are screwed whatever you do, if AC>>R we win whatever you do. Your paper makes a difference in those worlds where AC≈R, and in those worlds it helps iff ac>R.
This model treats alignment and capabilities as continuous, fungible quantities that slowly accumulate. This is a dubious assumption. It also assumes that conditional on us being in the marginal world (The world very where good and bad outcomes are both very close) that your mainline probability involves research continuing at its current ratio.
If for example, you were extremely pessimistic, and think that the only way we have any chance is if a portal to Dath ilan opens up, then the goal is largely to hold off all research for as long as possible, to maximize the time a deus ex machina can happen in. Other goals might include publishing the sort of research most likely to encourage a massive global “take AI seriously” movement.
So, the main takeaway is that we need some notion of fungibility/additivity of research progress (for both alignment and capabilities) in order for the “ratio” model to make sense.
Some places fungibility/additivity could come from:
research reducing time-until-threshold-is-reached additively and approximately-independently of other research
probabilistic independence in general
a set of rate-limiting constraints on capabilities/alignment strategies, such that each one must be solved independent of the others (i.e. solving each one does not help solve the others very much)
Fungibility is necessary, but not sufficient for the “if your work has a better ratio than average research, publish”. You also need your uncertainty to be in the right place.
If you were certain of R, and uncertain what ACfuture research might have, you get a different rule, publish if ac>R.
Model 1. Your new paper produces c units of capabilities, and a units of alignment. When C units of capabilities are reached, an AI is produced, and it is aligned iff A units of alignment has been produced. The rest of the world produces, and will continue to produce, alignment and capabilities research in ratio R. You are highly uncertain about A and/or C, but have a good guess at a,c,R.
In this model, if AC<<R we are screwed whatever you do, if AC>>R we win whatever you do. Your paper makes a difference in those worlds where AC≈R, and in those worlds it helps iff ac>R.
This model treats alignment and capabilities as continuous, fungible quantities that slowly accumulate. This is a dubious assumption. It also assumes that conditional on us being in the marginal world (The world very where good and bad outcomes are both very close) that your mainline probability involves research continuing at its current ratio.
If for example, you were extremely pessimistic, and think that the only way we have any chance is if a portal to Dath ilan opens up, then the goal is largely to hold off all research for as long as possible, to maximize the time a deus ex machina can happen in. Other goals might include publishing the sort of research most likely to encourage a massive global “take AI seriously” movement.
So, the main takeaway is that we need some notion of fungibility/additivity of research progress (for both alignment and capabilities) in order for the “ratio” model to make sense.
Some places fungibility/additivity could come from:
research reducing time-until-threshold-is-reached additively and approximately-independently of other research
probabilistic independence in general
a set of rate-limiting constraints on capabilities/alignment strategies, such that each one must be solved independent of the others (i.e. solving each one does not help solve the others very much)
???
Fungibility is necessary, but not sufficient for the “if your work has a better ratio than average research, publish”. You also need your uncertainty to be in the right place.
If you were certain of R, and uncertain what ACfuture research might have, you get a different rule, publish if ac>R.