M. Y. Zuo comments on Why do we care about agency for alignment?

M. Y. Zuo 8 Aug 2023 1:09 UTC
1 point
0
Is there a fixed, concrete, definition of “convergent property” or “instrumentally convergent” that most folks can agree on?
From what I see it’s more loosely and vaguely defined then “agency” itself, so it’s not really dispelling the confusion anyone may have.
- Max H 8 Aug 2023 18:15 UTC
  2 points
  0
  Parent
  I don’t know if there’s a standard definition or reference for instrumental convergence other than the LW tag, but convergence in general is a pretty well-known phenomenon.
  For example, many biological mechanisms which evolved independently end up looking remarkably similar, because that just happens to be the locally-optimal way of doing things, if you’re in the design space of of iterative mutation of DNA.
  
  Similarly in all sorts of engineering fields, methods or tools or mechanisms are often re-invented independently, but end up converging on very functionally or even visually similar designs, because they are trying to accomplish or optimize for roughly the same thing, and there’s only so many ways of doing so optimally, given enough constraints.
  Instrumental convergence in the context of agency and AI is just that principle applied to strategic thinking and mind design specifically. In that context, it’s more of a hypothesis, since we don’t actually have more than one example of a human-level intelligence being developed. But even if mind designs don’t converge on some kind of optimal form, artificial minds could still be efficient w.rt. humans, which would have many of the same implications.