As you know, large language models can be understood as simulators or more simply as models predicting what token would plausibly appear next after a first sequence of token.
As a consequence, when you ask a LLM to simulate extreme opinions of an AI, it converges very quickly towards “we should eradicate humans” (https://twitter.com/Simeon_Cps/status/1599470463578968064?s=20&t=rj2357Vof9sOLnIma6ZzIA). Other instances of that were observed on other language models.
As a consequence, is it plausible that some very powerful simulators would actually try to take over the world? Write a very detailed plan to take over the world?
It seems to me that yes. More worryingly, if LLMs trained with RL keep their “simulator” aspect, it could be quite natural for an LLM with agentic property to behave still a bit as a simulator while having the capacity to operate in the real-world. And thus it would make a scenario where a simulator literally tries to take over the world because it was asked to do what an evil AGI would do more likely.
And if it’s true, do you think that it’s a consideration to have in mind or that it’s in practice irrelevant?
Well, yes, if you as much as tease at the concept of AGI, simulators will frequently start blathering about taking over the world.
Example:
As for whether this sort of storytelling is at all likely to actually lead to the world being taken over, well, it depends on how difficult the world is to take over, how smart simulators are going to get before the world is taken over in some other way, and what else if anything the model has been optimized for other than generic language prediction.
Language models trained with purely self-supervised learning have several properties, I think, that make it rather more unlikely/difficult for them to take over the world autonomously, even if they’re in many measures superhuman and can tell excellent and realistic stories about doing so, e.g. lack of calibration. But a sufficiently smart LLM can calibrate itself at runtime, manage or externalize its memory of it has limited context, etc. (Also, relatedly, most stories aren’t intended to be realistic plans/action-sequences, but for LLMs it suffices that it’s imaginable that one could be.) I certainly think it’s possible in principle for an LLM story about taking over the world to lead to the actual taking over of the world.