Some people want to apply selection arguments because they believe that selection arguments bypass the need to understand mechanistic details to draw strong conclusions. I think this is mistaken, and that selection arguments often prove too much, and to understand why, you have to know something about the mechanisms.
It is clear that selecting for X selects for agents which historically did X in the course of the selection. But how this generalizes outside of the selecting strongly depends on the selection process and architecture. It could be a capabilities generalization, reward generalization for the written-down reward, generalization for some other reward function, or something else entirely.
We cannot predict how the agent will generalize without considering the details of its construction.
Strongly agree with this in particular:
(emphasis mine). I think it’s an application of the no free lunch razor
It is clear that selecting for X selects for agents which historically did X in the course of the selection. But how this generalizes outside of the selecting strongly depends on the selection process and architecture. It could be a capabilities generalization, reward generalization for the written-down reward, generalization for some other reward function, or something else entirely.
We cannot predict how the agent will generalize without considering the details of its construction.