I think once you have an LM agent that is sufficiently powerful so as to be economically competitive as an independent actor in a lot of domains (if that is even possible—I am still skeptical about LLMs), we’ve reached “Armageddon”. At that point, the economic pressure to improve upon it will be massive, and there is no particular reason these improvements have to stay limited to LLMs (you could e.g. build some sort of backchaining/optimization on top of it and use it to train the LLM, burning away the interpretability/safety benefits of LLMs). And I have a hard time seeing AI safety winning over people purely concerned with empowerment maximization in this fight, as the latter have a simpler problem to solve and can therefore probably solve it faster.
I think once you have an LM agent that is sufficiently powerful so as to be economically competitive as an independent actor in a lot of domains (if that is even possible—I am still skeptical about LLMs), we’ve reached “Armageddon”. At that point, the economic pressure to improve upon it will be massive, and there is no particular reason these improvements have to stay limited to LLMs (you could e.g. build some sort of backchaining/optimization on top of it and use it to train the LLM, burning away the interpretability/safety benefits of LLMs). And I have a hard time seeing AI safety winning over people purely concerned with empowerment maximization in this fight, as the latter have a simpler problem to solve and can therefore probably solve it faster.