It’s not sufficient to argue that taking over the world will improve prediction accuracy. You also need to argue that during the training process (in which taking over the world wasn’t possible), the agent acquired a set of motivations and skills which will later lead it to take over the world. And I think that depends a lot on the training process.
[...] if during training the agent is asked questions about the internet, but has no ability to edit the internet, then maybe it will have the goal of “predicting the world”, but maybe it will have the goal of “understanding the world”. The former incentivises control, the latter doesn’t.
I agree with your key claim that it’s not obvious/guaranteed that an AI system that has faced some selection pressure in favour of predicting/understanding the world accurately would then want to take over the world. I also think I agree that a goal of “understanding the world” is a somewhat less dangerous goal in this context than a goal of “predicting the world”. But it seems to me that a goal of “understanding the world” could still be dangerous for basically the same reason as why “predicting the world” could be dangerous. Namely, some world states are easier to understand than others, and some trajectories of the world are easier to maintain an accurate understanding of than others.
E.g., let’s assume that the “understanding” is meant to be at a similar level of analysis to that which humans typically use (rather than e.g., being primarily focused at the level of quantum physics), and that (as in humans) the AI sees it as worse to have a faulty understanding of “the important bits” than “the rest”. Given that, I think:
a world without human civilization or with far more homogeneity of its human civilization seems to be an easier world to understand
a world that stays pretty similar in terms of “the important bits” (not things like distant stars coming into/out of existence), rather than e.g. having humanity spread through the galaxy creating massive structures with designs influenced by changing culture, requires less further effort to maintain an understanding of and has less risk of later being understood poorly
I’d be interested in whether you think I’m misinterpreting your statement or missing some important argument.
(Though, again, I see this just as pushback against one particular argument of yours, and I think one could make a bunch of other arguments for the key claim that was in question.)
I agree with your key claim that it’s not obvious/guaranteed that an AI system that has faced some selection pressure in favour of predicting/understanding the world accurately would then want to take over the world. I also think I agree that a goal of “understanding the world” is a somewhat less dangerous goal in this context than a goal of “predicting the world”. But it seems to me that a goal of “understanding the world” could still be dangerous for basically the same reason as why “predicting the world” could be dangerous. Namely, some world states are easier to understand than others, and some trajectories of the world are easier to maintain an accurate understanding of than others.
E.g., let’s assume that the “understanding” is meant to be at a similar level of analysis to that which humans typically use (rather than e.g., being primarily focused at the level of quantum physics), and that (as in humans) the AI sees it as worse to have a faulty understanding of “the important bits” than “the rest”. Given that, I think:
a world without human civilization or with far more homogeneity of its human civilization seems to be an easier world to understand
a world that stays pretty similar in terms of “the important bits” (not things like distant stars coming into/out of existence), rather than e.g. having humanity spread through the galaxy creating massive structures with designs influenced by changing culture, requires less further effort to maintain an understanding of and has less risk of later being understood poorly
I’d be interested in whether you think I’m misinterpreting your statement or missing some important argument.
(Though, again, I see this just as pushback against one particular argument of yours, and I think one could make a bunch of other arguments for the key claim that was in question.)