Oops, I actually wasn’t trying to discuss whether the action-space was wide enough to take over the world.
Ah, in hindsight your comment makes more sense.
I’m actually very curious if you agree with this; it seems like an important question.
Argh, I don’t know, you’re positing a setup that breaks the standard ML assumptions and so things get weird. If you have vanilla SGD, I think I agree, but I wouldn’t be surprised if that’s totally wrong.
There are definitely setups where I don’t agree, e.g. if you have an outer hyperparameter tuning loop around the SGD, then I think you can get the opposite behavior than what you’re claiming (I think this paper shows this in more detail, though it’s been edited significantly since I read it). That would still depend on how often you do the hyperparameter tuning, what hyperparameters you’re allowed to tune, etc.
----
On the rest of the comment: I feel like the argument you’re making is “when the loss function is myopic, the optimal policy ignores long-term consequences and is therefore safe”. I do feel better about this calling this “aligned at optimum”, if the loss function also incentivizes the AI system to do that which we designed the AI system for. It still feels like the lack of convergent instrumental subgoals is “just because of” the myopia, and that this strategy won’t work more generally.
----
Returning to the original claim:
Specifically, I think there exists some setups and some parsimonious definition of “optimal performance” [for STEM AI] such that optimal performance is aligned: and I claim that’s the more useful definition.
I do agree that these setups probably exist, perhaps using the myopia trick in conjunction with the simulated world trick. (I don’t think myopia by itself is enough; to have STEM AI enable a pivotal act you presumably need to give the AI system a non-trivial amount of “thinking time”.) I think you will still have a pretty rough time trying to define “optimal performance” in a way that doesn’t depend on a lot of details of the setup, but at least conceptually I see what you mean.
I’m not as convinced that these sorts of setups are really feasible—they seem to sacrifice a lot of benefits—but I’m pretty unconfident here.
Ah, in hindsight your comment makes more sense.
Argh, I don’t know, you’re positing a setup that breaks the standard ML assumptions and so things get weird. If you have vanilla SGD, I think I agree, but I wouldn’t be surprised if that’s totally wrong.
There are definitely setups where I don’t agree, e.g. if you have an outer hyperparameter tuning loop around the SGD, then I think you can get the opposite behavior than what you’re claiming (I think this paper shows this in more detail, though it’s been edited significantly since I read it). That would still depend on how often you do the hyperparameter tuning, what hyperparameters you’re allowed to tune, etc.
----
On the rest of the comment: I feel like the argument you’re making is “when the loss function is myopic, the optimal policy ignores long-term consequences and is therefore safe”. I do feel better about this calling this “aligned at optimum”, if the loss function also incentivizes the AI system to do that which we designed the AI system for. It still feels like the lack of convergent instrumental subgoals is “just because of” the myopia, and that this strategy won’t work more generally.
----
Returning to the original claim:
I do agree that these setups probably exist, perhaps using the myopia trick in conjunction with the simulated world trick. (I don’t think myopia by itself is enough; to have STEM AI enable a pivotal act you presumably need to give the AI system a non-trivial amount of “thinking time”.) I think you will still have a pretty rough time trying to define “optimal performance” in a way that doesn’t depend on a lot of details of the setup, but at least conceptually I see what you mean.
I’m not as convinced that these sorts of setups are really feasible—they seem to sacrifice a lot of benefits—but I’m pretty unconfident here.