That’s why the title says “power-seeking can be predictive” not “training-compatible goals can be predictive”.
You’re right. I was critiquing “power-seeking due to your assumptions isn’t probable, because I think your assumptions won’t hold” and not “power-seeking isn’t predictive.” I had misremembered the predictive/probable split, as introduced in Definitions of “objective” should be Probable and Predictive:
I don’t see a notion of “objective” that can be confidently claimed is:
Probable: there is a good argument that the systems we build will have an “objective”, and
Predictive: If I know that a system has an “objective”, and I know its behavior on a limited set of training data, I can predict significant aspects of the system’s behavior in novel situations (e.g. whether it will execute a treacherous turn once it has the ability to do so successfully).
Sorry for the confusion. I agree that power-seeking is predictive given your assumptions. I disagree that power-seeking is probable due to your assumptions being probable. The argument I gave above was actually:
The assumptions used in the post (“learns a randomly-selected training-compatible goal”) assign low probability to experimental results, relative to other predictions which I generated (and thus relative to other ways of reasoning about generalization),
Therefore the assumptions become less probable
Therefore power-seeking becomes less probable (at least, due to these specific assumptions becoming less probable; I still think P(power-seeking) is reasonably large)
I suspect that you agree that “learns a training-compatible goal” isn’t very probable/realistic. My point is then that the conclusions of the current work are weakened; maybe now more work has to go into the “can” in “Power-seeking can be probable and predictive.”
You’re right. I was critiquing “power-seeking due to your assumptions isn’t probable, because I think your assumptions won’t hold” and not “power-seeking isn’t predictive.” I had misremembered the predictive/probable split, as introduced in Definitions of “objective” should be Probable and Predictive:
Sorry for the confusion. I agree that power-seeking is predictive given your assumptions. I disagree that power-seeking is probable due to your assumptions being probable. The argument I gave above was actually:
The assumptions used in the post (“learns a randomly-selected training-compatible goal”) assign low probability to experimental results, relative to other predictions which I generated (and thus relative to other ways of reasoning about generalization),
Therefore the assumptions become less probable
Therefore power-seeking becomes less probable (at least, due to these specific assumptions becoming less probable; I still think P(power-seeking) is reasonably large)
I suspect that you agree that “learns a training-compatible goal” isn’t very probable/realistic. My point is then that the conclusions of the current work are weakened; maybe now more work has to go into the “can” in “Power-seeking can be probable and predictive.”