I think that prior to this paper, the discussion around scheming was pretty confusing, spread throughout many posts which were not all specifically about scheming, and was full of pretty bad arguments. This paper fixed that by bringing together most (all?) main considerations for and against expecting scheming to emerge.
I found this helpful to clarify my thinking around the topic, which makes me more confident in my focus on AI control and made me less confused when I worked on the Alignment faking paper.
It is also helpful as a list of reasons why someone reasonable might expect scheming (without finding it overwhelmingly likely either) that I can point skeptical people at without being afraid that it contains massive over or understatements.
I think this paper will become pretty outdated as we get closer to understanding what AGI looks like and as we get better model organisms, but I think that it currently is the best resource about the conceptual arguments for and against scheming propensity.
I strongly recommend (the audio version of) this paper for people who want to work on scheming propensity.
(For what it’s worth, it appears to me that people started using the term “scheming” in much more confusing and inconsistent ways after this post was written and tried to give that term a technical meaning. I currently think this was quite bad. I do like a lot of the content of the paper/essay/post. I have like one conversation every two weeks that ends up derailed or confused because the two participants are using “scheming” in different specific ways, assuming the other person has the same meaning in mind)
I think that prior to this paper, the discussion around scheming was pretty confusing, spread throughout many posts which were not all specifically about scheming, and was full of pretty bad arguments. This paper fixed that by bringing together most (all?) main considerations for and against expecting scheming to emerge.
I found this helpful to clarify my thinking around the topic, which makes me more confident in my focus on AI control and made me less confused when I worked on the Alignment faking paper.
It is also helpful as a list of reasons why someone reasonable might expect scheming (without finding it overwhelmingly likely either) that I can point skeptical people at without being afraid that it contains massive over or understatements.
I think this paper will become pretty outdated as we get closer to understanding what AGI looks like and as we get better model organisms, but I think that it currently is the best resource about the conceptual arguments for and against scheming propensity.
I strongly recommend (the audio version of) this paper for people who want to work on scheming propensity.
(For what it’s worth, it appears to me that people started using the term “scheming” in much more confusing and inconsistent ways after this post was written and tried to give that term a technical meaning. I currently think this was quite bad. I do like a lot of the content of the paper/essay/post. I have like one conversation every two weeks that ends up derailed or confused because the two participants are using “scheming” in different specific ways, assuming the other person has the same meaning in mind)