I think instrumental convergence should still apply to some utility functions over policies, specifically the ones that seem to produce “smart” or “powerful” behavior from simple rules.
I share an intuition in this area, but “powerful” behavior tendencies seems nearly equivalent to instrumental convergence to me. It feels logically downstream of instrumental convergence.
from simple rules
I already have a (somewhat weak) result on power-seeking wrt the simplicity prior over state-based reward functions. This isn’t about utility functions over policies, though.
I share an intuition in this area, but “powerful” behavior tendencies seems nearly equivalent to instrumental convergence to me. It feels logically downstream of instrumental convergence.
I already have a (somewhat weak) result on power-seeking wrt the simplicity prior over state-based reward functions. This isn’t about utility functions over policies, though.