To make a clarifying point (which will perhaps benefit other readers): you’re using the term “scheming” in a different sense from how Joe’s report or Ryan’s writing uses the term, right?
I assume your usage is in keeping with your paper here, which is definitely different from those other two writers’ usages. In particular, you use the term “scheming” to refer to a much broader set of failure modes. In fact, I think you’re using the term synonymously with Joe’s “alignment-faking”—is that right?
Yes, I use the term scheming in a much broader way, similar to how we use it in the in-context scheming paper. I would assume that our scheming term is even broader than Joe’s alignment-faking because it also includes taking direct covert action like disabling oversight (which arguably is not alignment-faking).
To make a clarifying point (which will perhaps benefit other readers): you’re using the term “scheming” in a different sense from how Joe’s report or Ryan’s writing uses the term, right?
I assume your usage is in keeping with your paper here, which is definitely different from those other two writers’ usages. In particular, you use the term “scheming” to refer to a much broader set of failure modes. In fact, I think you’re using the term synonymously with Joe’s “alignment-faking”—is that right?
Good point!
Yes, I use the term scheming in a much broader way, similar to how we use it in the in-context scheming paper. I would assume that our scheming term is even broader than Joe’s alignment-faking because it also includes taking direct covert action like disabling oversight (which arguably is not alignment-faking).