- For scheming, the model could reason about “should I still stay undercover”, “what should I do in case I should stay undercover” and “what should I do in case it’s time to attack” in parallel, finally using only one serial step to decide on its action.
I am also very interested in e.g. how one could operationalize the number of hops of inference of out-of-context reasoning required for various types of scheming, especially scheming in one-forward-pass; and especially in the context of automated AI safety R&D.
I am also very interested in e.g. how one could operationalize the number of hops of inference of out-of-context reasoning required for various types of scheming, especially scheming in one-forward-pass; and especially in the context of automated AI safety R&D.