We’re a bit inconsistent with terminology here. We say both “use” and “deploy”:
We think that it’s plausible that AI developers will adopt the policy that if they catch the models trying to escape, they stop using them.
and
You might think that this is an absurdly weak lab ask: “once your model has demonstrably tried to escape, stop deploying it”.
In the context of catch-then-shut-down, we use the term “deploy” to mean the same thing as “use”. So under the catch-then-shut-down plan, the lab no longer uses the model at all once it has been caught.
But, there is the possible caveat of safety scientific work which is obviously safe even if the AI is scheming (given realistic bounds on the capabilities). E.g., maybe it’s fine to carefully study the AI’s behavior in the case where it tried to escape with some precautions around persuasion.
So, when we say “deploy” or “use” we mean something like “use at all in ways which aren’t careful safety/alignment experiments, evaluations, or tests which are extremely likely to be safe even given that the AI is scheming and we might somewhat underestimate its capabilities”. Perhaps we should clarify in the post.
Internal deployment counts as deployment here, right?
Yes.
We’re a bit inconsistent with terminology here. We say both “use” and “deploy”:
and
In the context of catch-then-shut-down, we use the term “deploy” to mean the same thing as “use”. So under the catch-then-shut-down plan, the lab no longer uses the model at all once it has been caught.
But, there is the possible caveat of safety scientific work which is obviously safe even if the AI is scheming (given realistic bounds on the capabilities). E.g., maybe it’s fine to carefully study the AI’s behavior in the case where it tried to escape with some precautions around persuasion.
So, when we say “deploy” or “use” we mean something like “use at all in ways which aren’t careful safety/alignment experiments, evaluations, or tests which are extremely likely to be safe even given that the AI is scheming and we might somewhat underestimate its capabilities”. Perhaps we should clarify in the post.