That’s possible. But it seems like way less of a convergent instrumental goal for agents living in a simulated world-models. Both options—our world optimized by us and our world optimized by a random deceptive model—probably contain very little of value as judged by agents in another random deceptive model.
So yeah, I would say some models would think like this, but I would expect the total weight on models that do to be much lower.
That’s possible. But it seems like way less of a convergent instrumental goal for agents living in a simulated world-models. Both options—our world optimized by us and our world optimized by a random deceptive model—probably contain very little of value as judged by agents in another random deceptive model.
So yeah, I would say some models would think like this, but I would expect the total weight on models that do to be much lower.