I think I have an example of “an optimizer_1 could turn into an optimizer_2 unexpectedly if it becomes sufficiently powerful”. I posted it a couple days ago: Self-supervised learning & manipulative predictions. A self-supervised learning system is an optimizer_1: It’s trying to predict masked bits in a fixed, pre-loaded set of data. This task does not entail interacting with the world, and we would presumably try hard to design it not to interact with the world.
However, if it was a powerful learning system with world-knowledge (via its input data) and introspective capabilities, it would eventually figure out that it’s an AGI and might hypothesize what environment it’s in, and then hypothesize that its operations could affect its data stream via unintended causal pathways, e.g. sending out radio signals. Then, if it used certain plausible types of heuristics as the basis for its predictions of masked bits, it could wind up making choices based on their downstream effects on itself via manipulating the environment. In other words, it starts acting like an optimizer_2.
I’m not super-confident about any of this and am open to criticism. (And I agree with you that this a useful distinction regardless; indeed I was arguing a similar (but weaker) point recently, maybe not as elegantly, at this link)
I think I have an example of “an optimizer_1 could turn into an optimizer_2 unexpectedly if it becomes sufficiently powerful”. I posted it a couple days ago: Self-supervised learning & manipulative predictions. A self-supervised learning system is an optimizer_1: It’s trying to predict masked bits in a fixed, pre-loaded set of data. This task does not entail interacting with the world, and we would presumably try hard to design it not to interact with the world.
However, if it was a powerful learning system with world-knowledge (via its input data) and introspective capabilities, it would eventually figure out that it’s an AGI and might hypothesize what environment it’s in, and then hypothesize that its operations could affect its data stream via unintended causal pathways, e.g. sending out radio signals. Then, if it used certain plausible types of heuristics as the basis for its predictions of masked bits, it could wind up making choices based on their downstream effects on itself via manipulating the environment. In other words, it starts acting like an optimizer_2.
I’m not super-confident about any of this and am open to criticism. (And I agree with you that this a useful distinction regardless; indeed I was arguing a similar (but weaker) point recently, maybe not as elegantly, at this link)