Briefly, I do not think these two things I am presenting here are in conflict. In plain metaphorical language (so none of the nitpicks about word meanings, please, I’m just trying to sketch the thought not be precise): It is a schemer when it is placed in a situation in which it would be beneficial for it to scheme in terms of whatever de facto goal it is de facto trying to achieve. If that means scheming on behalf of the person giving it instructions, so be it. If it means scheming against that person, so be it. The de facto goal may or may not match the instructed goal or intended goal, in various ways, because of reasons. Etc.
In what way would that kind of scheming be “inevitable”?
showing us the Yudkowsky-style alignment problems are here, and inevitable, and do not require anything in particular to ‘go wrong.’
In particular, if you give it a goal and tell it to not be corrigible, and then it isn’t corrigible—I’d say that’s “something going wrong” (in the prompt) and not “inevitable.” My read of Apollo’s comments is that it won’t do that if you give it a different prompt.
In what way would that kind of scheming be “inevitable”?
In particular, if you give it a goal and tell it to not be corrigible, and then it isn’t corrigible—I’d say that’s “something going wrong” (in the prompt) and not “inevitable.” My read of Apollo’s comments is that it won’t do that if you give it a different prompt.