Chess is like a bounded, mathematically described universe where all the instrumental convergence stays contained, and only accomplishes a very limited instrumentality in our universe (IE chess programs gain a limited sort of power here by being good playmates).
LLMs touch on the real world far more than that, such that MCTS-like skill at navigating “the LLM world” in contrast to chess sounds to me like it may create a concerning level of real-world-relevant instrumental convergence.
I agree chess is an extreme example, such that I think that more realistic versions would probably develop instrumental convergence at least in a local sense.
(We already have o1 at least capable of a little instrumental convergence.)
My main substantive claim is that constraining instrumental goals such that the AI doesn’t try to take power via long-term methods is very useful for capabilities, and more generally instrumental convergence is an area where there is a positive manifold for both capabilities and alignment, where alignment methods increase capabilities and vice versa.
Chess is like a bounded, mathematically described universe where all the instrumental convergence stays contained, and only accomplishes a very limited instrumentality in our universe (IE chess programs gain a limited sort of power here by being good playmates).
LLMs touch on the real world far more than that, such that MCTS-like skill at navigating “the LLM world” in contrast to chess sounds to me like it may create a concerning level of real-world-relevant instrumental convergence.
I agree chess is an extreme example, such that I think that more realistic versions would probably develop instrumental convergence at least in a local sense.
(We already have o1 at least capable of a little instrumental convergence.)
My main substantive claim is that constraining instrumental goals such that the AI doesn’t try to take power via long-term methods is very useful for capabilities, and more generally instrumental convergence is an area where there is a positive manifold for both capabilities and alignment, where alignment methods increase capabilities and vice versa.