It seems like instrumental convergence is restricted to agent AI’s, is that true?
Also what is going on with mesa-optimizers? Why is it expected that they will will be more likely to become agentic than the base optimizer when they are more resource constrained?
The more agentic a system is the more it is likely to adopt convergent instrumental goals, yes.
Why agents are powerful explores why agentic mesa optimizers might arise accidentally during training. In particular, agents are an efficient way to solve many challenges, so the mesa optimizer being resource constrained would lean in the direction of more agency under some circumstances.
It seems like instrumental convergence is restricted to agent AI’s, is that true?
Also what is going on with mesa-optimizers? Why is it expected that they will will be more likely to become agentic than the base optimizer when they are more resource constrained?
The more agentic a system is the more it is likely to adopt convergent instrumental goals, yes.
Why agents are powerful explores why agentic mesa optimizers might arise accidentally during training. In particular, agents are an efficient way to solve many challenges, so the mesa optimizer being resource constrained would lean in the direction of more agency under some circumstances.